Reduced complexity nucleic acid targets and methods of using same

ABSTRACT

The invention provides a method of measuring the level of two or more nucleic acid molecules in a target by contacting a probe with a target comprising two or more nucleic acid molecules, wherein the nucleic acid molecules are arbitrarily sampled and wherein the arbitrarily sampled nucleic acid molecules comprise a subset of the nucleic acid molecules in a population of nucleic acid molecules; and detecting the amount of specific binding of the target to the probe. The invention also provides a method of measuring the level of two or more nucleic acid molecules in a target by contacting a probe with a target comprising two or more nucleic acid molecules, wherein the nucleic acid molecules are statistically sampled and wherein the statistically sampled nucleic acid molecules comprise a subset of the nucleic acid molecules in a population of nucleic acid molecules; and detecting the amount of specific binding of the target to the probe.

[0001] This application claims the benefit of priority of provisionalapplication serial No. 60/083,331, filed Apr. 27, 1998, No. 60/098,070,filed Aug. 27, 1998, and No. 60/118,624, filed Feb. 4, 1999, each ofwhich is incorporated herein by reference.

[0002] This invention was made with government support under grantnumber CA68822, NS33377, AI34829 awarded by the National Institutes ofHealth and under grant number BC961294 awarded by the Department ofDefense. The government has certain rights in the invention.

BACKGROUND OF THE INVENTION

[0003] The present invention relates generally to methods of measuringnucleic acid molecules in a target and more specifically, to methods ofdetecting differential gene expression.

[0004] Every living organism requires genetic material, deoxyribonucleicacid (DNA), which contains genes that impart a unique collection ofcharacteristics to the organism. DNA is composed of two strands ofcomplementary sequences of nucleotide building blocks. The two strandsbind, or hybridize, with the complementary sequence to form a doublehelix. Genes are discreet segments of the DNA and provide theinformation required to generate a new organism and to give thatorganism its unique characteristics. Even simple organisms, such asbacteria, contain thousands of genes, and the number is many foldgreater in complex organisms such as humans. Understanding thecomplexities of the development and functioning of living organismsrequires knowledge of these genes.

[0005] For many years, scientists have searched for and identified anumber of genes important in the development and function of livingorganisms. The search for new genes has greatly accelerated in recentyears due to directed projects aimed at identifying genetic informationwith the ultimate goal being the determination of the entire genome ofan organism and its encoded genes, termed genomic studies. One of themost ambitious of these genomic projects has been the Human GenomeProject, with the goal of sequencing the entire human genome. Recentadvances in sequencing technology have led to a rapid accumulation ofgenetic information, which is available in both public and privatedatabases. These newly discovered genes as well as those genes soon tobe discovered provide a rich resource of potential targets for thedevelopment of new drugs.

[0006] Despite the rapid pace of gene discovery, there remains aformidable task of characterizing these genes and determining thebiological function of these genes. The characterization of newlydiscovered genes is often a time consuming and laborious undertaking,sometimes taking years to determine the function of a gene or its geneproduct, particularly in complex higher organisms.

[0007] Another level of complexity arises when complex interactionsbetween genes and their gene products are contemplated. To understandhow an organism works, it is important not only to understand what rolea gene, its transcript and its gene product plays in the workings of anorganism, it is also important to understand potentially complexinteractions between the gene, its transcript, or its gene product andother genes and their gene products.

[0008] A number of approaches have been used to assess gene expressionin a particular cell or tissue of an organism. These approaches havebeen used to characterize gene expression under various conditions,including looking at differences in expression under differingconditions. However, most of these methods are useful for detectingtranscripts that are abundant transcripts but have proven less usefulfor detecting transcripts that are of low abundance, particularly whenlooking at the expression of a number of genes rather than a selectedfew genes. Since genes expressed at low levels often regulate thephysiological pathways in a cell, it is desirable to detect transcriptshaving at low abundance.

[0009] Thus, a need exists for a method to characterize the expressionpattern of genes under a given set of conditions and to detect lowabundance transcripts. The present invention satisfies this need andprovides related advantages as well.

SUMMARY OF THE INVENTION

[0010] The invention provides a method of measuring the level of two ormore nucleic acid molecules in a target by contacting a probe with atarget comprising two or more nucleic acid molecules, wherein thenucleic acid molecules are arbitrarily sampled and wherein thearbitrarily sampled nucleic acid molecules comprise a subset of thenucleic acid molecules in a population of nucleic acid molecules; anddetecting the amount of specific binding of the target to the probe. Theinvention also provides a method of measuring the level of two or morenucleic acid molecules in a target by contacting a probe with a targetcomprising two or more nucleic acid molecules, wherein the nucleic acidmolecules are statistically sampled and wherein the statisticallysampled nucleic acid molecules comprise a subset of the nucleic acidmolecules in a population of nucleic acid molecules; and detecting theamount of specific binding of the target to the probe.

BRIEF DESCRIPTION OF THE DRAWINGS

[0011]FIG. 1 shows differential hybridization to clone arrays. Eachimage is an autoradiogram that spans about 4000 double spotted E. colicolonies, each carrying a different EST clone. Panel A shows the bindingof a total target made from 1 μg of polyA⁺ RNA from confluent humankeratinocytes that was radiolabeled during reverse transcription. PanelsB and C show RAP-PCR fingerprint with a pair of arbitrary primers thatwas performed on cDNA from oligo(dT) primed cDNA of confluent humankeratinocytes that were untreated (Panel B) and treated with epidermalgrowth factor (EGF) (Panel C). The two radiolabeled colonies from onedifferentially expressed cDNA are indicated with an arrow. Panel D showsa RAP-PCR fingerprint with a different pair of arbitrary primers thatwas performed on RNA from confluent human keratinocytes.

[0012]FIG. 2 shows RAP-PCR fingerprints resolved on apolyacrylamide-urea gel. Reverse transcription was performed with anoligo-dT primer on 250, 125, 62.5 and 31.25 ng RNA in lanes 1, 2, 3, and4 respectively. RNA was from untreated, TGF-β and EGF treated HaCaTcells, as indicated. RAP-PCR was performed with two sets of primers,primers GP14 and GP16 (Panel A) or Nucl+ and OPN24 (Panel B). Molecularweight markers are indicated on the left of each panel, and the sizes ofthe two differentially amplified RAP-PCR-products are indicated witharrows (317 and 291).

[0013]FIG. 3 shows hybridization of targets generated by RAP-PCR toarrays. Shown are autoradiograms of the bottom half of duplicates of thesame filter (Genome Systems) hybridized with radiolabeled DNA. Panels Aand B show hybridization of two RAP-PCR reactions generated using thesame primers and derived from untreated (Panel A) and EGF treated (PanelB) HaCaT cells. Three double-spotted clones that show differentialhybridization signals are marked on each array. The GenBank accessionnumbers of the clone and the corresponding genes are H10045 and H10098,corresponding to vav-3 and AF067817 (square); H28735, gene unknown,similar to $heparan sulfate 3-O-sulfotransferase-1, AF019386 (circle);R48633, gene unknown (diamond). Panel C shows an array hybridized with aRAP-PCR target generated using the same RNA as in panel A but with adifferent pair of primers. Panel D shows an array hybridized with cDNAtarget generated by reverse transcription of 1 μg poly(A)⁺-selectedmRNA. Panel E shows an array hybridized with human genomic DNA labeledusing random priming.

[0014]FIG. 4 shows resolution of RT-PCR products on polyacrylamide-ureagels and confirmation of differential regulation in response to EGFusing low stringency RT-PCR. Reverse transcription was performed at twoRNA concentrations (500 ng, left column; 250 ng, right column) atdifferent cycle numbers. Shown are bands for the control (22 cycles);for GenBank accession number H11520 (22 cycles); for TSC-22,corresponding to GenBank accession numbers H11073 and H11161 (19cycles); and for R48633 (19 cycles).

[0015]FIG. 5 shows differential display of untreated and EGF treatedHaCaT cells. Panel A shows differential display reactions performed atfour different starting concentrations of total RNA (designated 1, 2, 3and 4 and corresponding to 800, 400, 200 and 100 ng, respectively),which was then used for PCR. An anchored oligo(dT) primer, H-T₁₁C orH-T₁₁A, was used in combination with one of two different arbitraryprimers, H-AP3 or H-AP4, which are indicated above the lanes. Panel Bshows differential display using the arbitrary primer KA2 with threedifferent anchored oligo(dT) primers, T₁₃V, AT₁₅A and GT₁₅G, used atfour different starting concentrations of RNA (designated 1, 2, 3 and 4and corresponding to 1000, 500, 250 and 125 ng, respectively), which wasthen used for PCR.

[0016]FIG. 6 shows hybridization of differential display reactions tocDNA arrays. Differential display products generated with the primersGT₁₅G and KA2 from untreated (Panel A) and EGF treated (Panel B) HaCaTcells were labeled by random priming and hybridized to cDNA arrays. Asection representing less than 5% of a membrane is shown with adifferentially regulated gene indicated by an arrow. Panel C showshybridization of differential display products generated with theprimers AT₁₅A and KA2 from untreated HaCaT cells.

[0017]FIG. 7 shows confirmation of differential regulation of genes byEGF using low stringency RT-PCR. Reverse transcription was performed attwofold different RNA concentrations, and low stringency PCR wasperformed at different cycle numbers. The amount of input RNA used forinitial first strand cDNA synthesis and used in each RAP-PCR reactionwas 125 ng, left column and 250 ng, right column. The RT-PCR productsfrom 19 cycle reactions were resolved on polyacrylamide-urea gels. Shownare the products for the control (unregulated) and genesexhibiting≧1.6-fold regulation in response to EGF, corresponding toGenBank accession numbers R72714, H14529, H27389, H05545, H27969,R73247, and H21777.

[0018]FIG. 8 shows the nucleotide sequence for GenBank accession numberH11520 (SEQ ID NO:1).

[0019]FIG. 9 shows the nucleotide sequence for GenBank accession numberH11161 (SEQ ID NO:2).

[0020]FIG. 10 shows the nucleotide sequence for GenBank accession numberH11073 (SEQ ID NO:3).

[0021]FIG. 11 shows the nucleotide sequence for GenBank accession numberU35048 (SEQ ID NO:4).

[0022]FIG. 12 shows the nucleotide sequence for GenBank accession numberR48633 (SEQ ID NO:5).

[0023]FIG. 13 shows the nucleotide sequence for GenBank accession numberH28735 (SEQ ID NO:6).

[0024]FIG. 14 shows the nucleotide sequence for GenBank accession numberAF019386 (SEQ ID NO:7).

[0025]FIG. 15 shows the nucleotide sequence for GenBank accession numberH25513 (SEQ ID NO:8).

[0026]FIG. 16 shows the nucleotide sequence for GenBank accession numberH25514 (SEQ ID NO:9).

[0027]FIG. 17 shows the nucleotide sequence for GenBank accession numberM13918 (SEQ ID NO:10).

[0028]FIG. 18 shows the nucleotide sequence for GenBank accession numberH12999 (SEQ ID NO:11).

[0029]FIG. 19 shows the nucleotide sequence for GenBank accession numberH05639 (SEQ ID NO:12).

[0030]FIG. 20 shows the nucleotide sequence for GenBank accession numberL49207 (SEQ ID NO:13).

[0031]FIG. 21 shows the nucleotide sequence for GenBank accession numberH15184 (SEQ ID NO:14).

[0032]FIG. 22 shows the nucleotide sequence for GenBank accession numberH15124 (SEQ ID NO:15) FIG. 23 shows the nucleotide sequence for GenBankaccession number X79781 (SEQ ID NO:16).

[0033]FIG. 24 shows the nucleotide sequence for GenBank accession numberH25195 (SEQ ID NO:17).

[0034]FIG. 25 shows the nucleotide sequence for GenBank accession numberH24377 (SEQ ID NO:18).

[0035]FIG. 26 shows the nucleotide sequence for GenBank accession numberM31627 (SEQ ID NO:19).

[0036]FIG. 27 shows the nucleotide sequence for GenBank accession numberH23972 (SEQ ID NO:20).

[0037]FIG. 28 shows the nucleotide sequence for GenBank accession numberH27350 (SEQ ID NO:21).

[0038]FIG. 29 shows the nucleotide sequence for GenBank accession numberAB000712 (SEQ ID NO:22).

[0039]FIG. 30 shows the nucleotide sequence for GenBank accession numberR75916 (SEQ ID NO:23).

[0040]FIG. 31 shows the nucleotide sequence for GenBank accession numberX85992 (SEQ ID NO:24).

[0041]FIG. 32 shows the nucleotide sequence for GenBank accession numberR73021 (SEQ ID NO:25).

[0042]FIG. 33 shows the nucleotide sequence for GenBank accession numberR73022 (SEQ ID NO:26).

[0043]FIG. 34 shows the nucleotide sequence for GenBank accession numberU66894 (SEQ ID NO:27).

[0044]FIG. 35 shows the nucleotide sequence for GenBank accession numberH10098 (SEQ ID NO:28).

[0045]FIG. 36 shows the nucleotide sequence for GenBank accession numberH10045 (SEQ ID NO:29).

[0046]FIG. 37 shows the nucleotide sequence for GenBank accession numberAF067817 (SEQ ID NO:30).

[0047]FIG. 38 shows the nucleotide sequence for GenBank accession numberR72714 (SEQ ID NO:31).

[0048]FIG. 39 shows the nucleotide sequence for GenBank accession numberX52541 (SEQ ID NO:32).

[0049]FIG. 40 shows the nucleotide sequence for GenBank accession numberH14529 (SEQ ID NO:33).

[0050]FIG. 41 shows the nucleotide sequence for GenBank accession numberM10277 (SEQ ID NO:34).

[0051]FIG. 42 shows the nucleotide sequence for GenBank accession numberH27389 (SEQ ID NO:35).

[0052]FIG. 43 shows the nucleotide sequence for GenBank accession numberD89092 (SEQ ID NO:36).

[0053]FIG. 44 shows the nucleotide sequence for GenBank accession numberD89678 (SEQ ID NO:37).

[0054]FIG. 45 shows the nucleotide sequence for GenBank accession numberH05545 (SEQ ID NO:38).

[0055]FIG. 46 shows the nucleotide sequence for GenBank accession numberJ03804 (SEQ ID NO:39).

[0056]FIG. 47 shows the nucleotide sequence for GenBank accession numberH27969 (SEQ ID NO:40).

[0057]FIG. 48 shows the nucleotide sequence for GenBank accession numberR73247 (SEQ ID NO:41).

[0058]FIG. 49 shows the nucleotide sequence for GenBank accession numberU51336 (SEQ ID NO:42).

[0059]FIG. 50 shows the nucleotide sequence for GenBank accession numberH21777 (SEQ ID NO:43).

[0060]FIG. 51 shows the nucleotide sequence for GenBank accession numberK00558 (SEQ ID NO:44).

[0061]FIG. 52 shows the nucleotide sequence for GenBank accession numberD31765 (SEQ ID NO:45).

DETAILED DESCRIPTION OF THE INVENTION

[0062] The invention provides methods for measuring the level of two ormore nucleic acid molecules in a target by contacting a probe with anarbitrarily sampled target or a statistically sampled target anddetecting the amount of specific binding to the probe. The inventionalso provides methods of identifying two or more differentiallyexpressed nucleic acid molecules associated with a condition bymeasuring the level of two or more nucleic acid molecules in a targetand comparing the expression levels to expression levels of the nucleicacid molecules in a second target. The methods of the invention areuseful for obtaining a profile of nucleic acid molecules expressed in atarget under a given set of conditions. The methods of the invention areparticularly useful for comparing the relative abundance of lowabundance nucleic acid molecules between two or more targets. Themethods of the invention are advantageous in that a profile of nucleicacid molecule abundance can be determined and correlated with a givenset of conditions or compared to another target to determine if theoriginal target was exposed to a particular set of conditions, therebyproviding information useful for assessing the diagnosis or treatment ofa disease.

[0063] The invention provides a method of measuring the abundance of twoor more nucleic acid molecules in a target. The method of the inventionincludes the steps of contacting a probe with a target comprising two ormore nucleic acid molecules, wherein the nucleic acid molecules arearbitrarily sampled and wherein the arbitrarily sampled nucleic acidmolecules comprise a subset of the nucleic acid molecules in apopulation of nucleic acid molecules; and detecting the amount ofspecific binding of the target to the probe.

[0064] As used herein, the term “nucleic acid molecule” refers to anucleic acid of two or more nucleotides. A nucleic acid molecule can beRNA or DNA. For example, a nucleic acid molecule can include messengerRNA (mRNA), transfer RNA (tRNA) or ribosomal RNA (rRNA). A nucleic acidmolecule can also include, for example, genomic DNA or cDNA. A nucleicacid molecule can be synthesized enzymatically, either in vivo or invitro, or the nucleic acid molecule can be chemically synthesized bymethods well known in the art. A nucleic acid molecule can also containmodified bases, for example, the modified bases found in tRNA such asinosine, methylinosine, dihyrouridine, ribothymidine, pseudouridine,methylguanosine and dimethylguanosine. Furthermore, a chemicallysynthesized nucleic acid molecule can incorporate derivatives ofnucleotide bases.

[0065] As used herein, the term “population of nucleic acid molecules”refers to a group of two or more different nucleic acid molecules. Apopulation of nucleic acid molecules can also be 3 or more, 5 or more,10 or more, 20 or more, 50 or more, 100 or more, 1000 or more or even10,000 or more different nucleic acid molecules. The nucleic acidmolecules can differ, for example, by a single nucleotide or bymodification of a single base. Generally, a population of nucleic acidmolecules is obtained from a target sample, for example, a cell, tissueor organism. In such a case, the population of nucleic acid moleculescontains the nucleic acid molecules of the target sample.

[0066] A population of nucleic acid molecules has characteristics thatcan differentiate one population of nucleic acid molecules from another.These characteristics are based on the number and nature of individualnucleic acid molecules comprising the population. Such characteristicsinclude, for example, the abundance of nucleic acid molecules in thepopulation. The abundance of an individual nucleic acid molecule can bean absolute amount in a given target sample or can be the amountrelative to other nucleic acid molecules in the target sample. In apopulation of nucleic acid molecules obtained from a target, individualnucleic acid molecules can be more abundant or less abundant relative toother nucleic acid molecules in the sample target. A less abundantsequence can also be relative abundance between two samples.

[0067] As used herein, a less abundant nucleic acid molecule can be, forexample, less than about 10% as abundant as the most abundant nucleicacid molecule in a population. A less abundant nucleic acid molecule canalso be less than about 1% as abundant, less than about 0.1% as abundantor less than about 0.01% as abundant as the most abundant nucleic acidmolecule in a population. For example, a low abundance nucleic acidmolecule can be less than about 10 copies per cell, or even as low as 1copy per cell.

[0068] Another characteristic of a population of nucleic acid moleculesis the complexity of the population. As used herein, “complexity” refersto the number of nucleic acid molecules having different sequences inthe population. For example, a population of nucleic acid moleculesrepresentative of the mRNA in a bacterial cell has lower complexity thana population of nucleic acid molecules representative of the mRNA in aeukaryotic cell, a tissue or an organism because a smaller number ofgenes are expressed in a bacterial cell relative to a eukaryotic cell,tissue or organism.

[0069] A population of nucleic acid molecules can also be characterizedby the properties of individual nucleic acid molecules in thepopulation. For example, the length of individual nucleic acid moleculescontributes to the characteristics of a population of nucleic acidmolecules. Similarly, the sequence of individual nucleic acid moleculesin the population contributes to the characteristics of the populationof nucleic acid molecules, for example, the G+C content of the nucleicacid sequences and any secondary structure that can form due tocomplementary stretches of nucleotide sequence that can undergointrastrand hybridization.

[0070] As used herein, the term “subset of nucleic acids” means lessthan all of a set of nucleic acid molecules. For example, a subset ofnucleic acid molecules of a target sample population would be less thanall of the nucleic acid molecules in the target sample population.Specifically excluded from a subset of nucleic acid molecules is a groupof nucleic acid molecules representative of all the nucleic acidmolecules in a sample target, for example, a target generated usingtotal cDNA or total mRNA.

[0071] As used herein, the term “target” refers to one or more nucleicacid molecules to which binding of a probe is desired. A target isdetectable when bound to a probe. A target of the invention generallycomprises two or more different nucleic acid molecules. A target can bederived from a population of nucleic acid molecules from a cell, tissueor organism. A target can also contain 3 or more, 5 or more, 10 or more,20 or more, 30 or more, 50 or more, 100 or more, 200 or more, 500 ormore, 1000 or more, 2000 or more, 5000 or more, or even 10,000 or moredifferent nucleic acid molecules. A target can have a detectable moietyassociated with it such as a radioactive label, a fluorescent label orany label that is detectable. When a target is labeled, for example,with a radioactive label, the target can be used “to probe” or hybridizewith other nucleic acid molecules. Methods of making a target aredisclosed herein.

[0072] A method of detection that directly measures binding of thetarget to a probe, without the need for a detectable moiety attached tothe target, can also be used. In such a case, the nucleic acid moleculesare directly detectable without modification of a nucleic acid moleculeof the target, for example, by attaching a detectable moiety. An exampleof such a detection method using a target without a detectable moiety isdetection of binding of a target using mass spectrometry. Anotherexample of a method using a target containing nucleic acid moleculeswithout an attached detectable moiety is binding the target to a probethat contains molecules having a detectable moiety. In such a case, thebinding of a target to the probe containing molecules having adetectable moiety is detected and, as such, the target is detectablewhen bound to the probe. An example is the “molecular beacon,” whereprobe binding causes separation of a fluorescent tag from a fluorescencequencher.

[0073] As used herein, the term “specific binding” means binding that ismeasurably different from a non-specific interaction. Specific bindingcan be measured, for example, by determining binding of a moleculecompared to binding of a control molecule, which generally is a moleculeof similar structure that does not have binding activity. For example,specific binding of a target to a probe can be determined by comparingbinding of the target with binding control nucleic acids not included inthe target. Specific binding can also be determined by competition witha control molecule that is similar to the target, for example, an excessof non-labeled target. In this case, specific binding is indicated ifthe binding of a labeled target to a probe is competitively inhibited byexcess unlabeled target.

[0074] The term “specific binding,” as used herein, includes both lowand high affinity specific binding. Specific binding can be exhibited,for example, by a low affinity molecule having a Kd of at least about10⁻⁴ M. Specific binding also can be exhibited by a high affinitymolecule, for example, a molecule having a Kd of at least about of 10⁻⁷M, at least about 10⁻⁸ M, at least about 10⁻⁹ M, at least about 10⁻¹⁰ M,or can have a Kd of at least about 10⁻¹¹ M or 10⁻¹² M or greater.

[0075] In the case of a probe comprising an array of nucleic acidmolecules, binding of a specific nucleic acid molecule of the probe toanother nucleic acid molecule is also known as hybridizing orhybridization. As used herein, the term “hybridizing” or “hybridization”refers to the ability of two strands of nucleic acid molecules tohydrogen bond in a sequence dependent manner. Under appropriateconditions, complementary nucleotide sequences can hybridize to formdouble stranded DNA or RNA, or a double stranded hybrid of RNA and DNA.Nucleic acid molecules with similar but non-identical sequences can alsohybridize under appropriate conditions.

[0076] As used herein, the term “probe” refers to a population of two ormore molecules to which binding of a target is desired. The molecules ofa probe include nucleic acid molecules, oligonucleotides andpolypeptide-nucleic acid molecules. A probe can additionally be an arrayof molecules.

[0077] In general, a probe is comprised of molecules immobilized on asolid support and the target is in solution. However, it is understoodthat a target can be bound to a solid support and a probe can be insolution. Furthermore, both the probe and the target can be in solution.It is understood that the configuration of the probe and target can bein solution or bound to a solid support, so long as the probe and targetcan bind to each other. When bound to a solid support, the binding ofthe probe or target to the support can be covalent or non-covalent, solong as the bound probe or target remains bound under conditions ofcontacting the solid support with a probe or target in solution andwashing of the solid support. If the probe and target hybridize orotherwise specifically interact, the probe or target bound to a solidsupport remains bound during the hybridization and washing steps.

[0078] As used herein, the term “sampled” or “samples,” when used inreference to a nucleic acid molecule, refers to a nucleic acid moleculeto which specific binding can be detected. A nucleic acid molecule thatsamples another molecule is capable of specifically binding to thatmolecule and being detected. For example, a probe can sample moleculesin a target by detectably binding to molecules in the target. Thosemolecules in the target to which nucleic acid molecules in the probespecifically bind are therefore sampled.

[0079] As used herein, the term “arbitrarily sampled” or “arbitrarilysampled nucleic acid molecule” means that a nucleic acid molecule issampled by binding based on its sequence without sampling based on aparticular site where a molecule will bind. When generating a targetcomprising arbitrarily sampled nucleic acid molecules from a populationof nucleic acid molecules, the target is generated without priorreference to the sequences of nucleic acid molecules in the population.Thus, it is not necessary to have previous knowledge of the nucleotidesequence of nucleic acid molecules in the population to arbitrarilysample the population. It is understood that knowledge of a nucleotidesequence of a nucleic acid molecule in the population does not precludethe ability to arbitrarily sample the population so long as thenucleotide sequence is not referenced before sampling the population.Methods for generating a probe containing arbitrarily sampled nucleicacid molecules are disclosed herein (see below and Examples I to III).

[0080] An arbitrarily sampled probe containing arbitrarily samplednucleic acid molecules can be generated using one or more arbitraryoligonucleotides. As used herein, the term “arbitrary oligonucleotide”means that the oligonucleotide is a sequence that is selected randomlyand is not selected based on its complementarity to any known sequence.As such, an arbitrary oligonucleotide can be used to arbitrarily samplea population of nucleic acid molecules.

[0081] An arbitrarily sampled nucleic acid molecule is sampled based onits sequence and is not based on binding to a predetermined sequence.For example, arbitrary oligonucleotides are oligonucleotides having anarbitrary sequence and, as such, will bind to a given nucleic acidmolecule because the complementary sequence of the arbitraryoligonucleotide occurs by chance in the nucleic acid molecule. Becausethe oligonucleotides can bind to a nucleic acid molecule based on thepresence of a complementary sequence, the sampling of the nucleic acidmolecule is based on that sequence. However, the binding of thearbitrary oligonucleotide to any particular nucleic acid molecule in apopulation is not determined prior to the binding of theoligonucleotide, for example, by comparing the sequence of the arbitraryoligonucleotides to known nucleic acid sequences and selecting theoligonucleotides based on previously known nucleic acid sequences. Theuse of arbitrary oligonucleotides as primers for amplification is wellknown in the art (Liang and Pardee, Science 257:967-971 (1992)).

[0082] As used herein, the term “oligonucleotide” refers to a nucleicacid molecule of at least 2 and less than about 1000 nucleotides. Anoligonucleotide can be, for example, at least about 5 nucleotides andless than about 100 nucleotides, for example less than about 50nucleotides.

[0083] The invention also provides a method of measuring the level oftwo or more nucleic acid molecules in a target by contacting a probewith a target comprising two or more nucleic acid molecules, wherein thenucleic acid molecules are statistically sampled and wherein thestatistically sampled nucleic acid molecules comprise a subset of thenucleic acid molecules in a population of nucleic acid molecules; anddetecting the amount of specific binding of the target to the probe.

[0084] As used herein, the term “statistically sampled nucleic acidmolecule” means that a nucleic acid sequence is sampled based on itssequence with prior reference to its nucleotide sequence bypredetermining the statistical occurrence of a nucleotide sequence intwo or more nucleic acid molecules. Thus, to obtain a statisticallysampled nucleic acid molecule, it is necessary to have previousknowledge of the nucleotide sequence of at least two nucleic acidmolecules in the population.

[0085] A statistically sampled nucleic acid molecule is sampled based onthe sequence of a nucleic acid molecule with prior reference to itsnucleotide sequence but without prior reference to a preselected portionof its nucleotide sequence. A group of oligonucleotides can beidentified without prior reference to a preselected portion of anucleotide sequence, for example, by determining a group of arbitraryoligonucleotides. The arbitrary oligonucleotides can then be referencedto known nucleotide sequences by determining which of the arbitraryprimers match the known nucleotide sequences. Such arbitraryoligonucleotides referenced to known nucleotide sequences are selectedbased on the known sequences and thus become statistical primers. Thismethod is in contrast to a method where a preselected site in a knownnucleotide sequence is identified and an oligonucleotide is specificallydesigned to match that preselected site.

[0086] Statistical sampling is advantageous because a set ofoligonucleotides can be determined based on the presence in a group ofknown sequences of a sequence complementary to the oligonucleotides. Theoligonucleotides can further be ranked based on complexity binding.Complexity binding means that a given oligonucleotide binds to more thanone nucleic acid molecule. The larger the number of molecules to whichan oligonucleotide can bind, the higher the “complexity binding.”Statistical selection can be used to enhance for complexity binding byranking oligonucleotides based on the number of sequences to which theoligonucleotides will bind and selecting those that bind to the highestnumber (see, for example, WO 99/11823). Statistical sampling can bebased, for example, on the binding of an oligonucleotide to 5 or morenucleic acid molecules, and can be based on the binding to 10 or more,50 or more, 100 or more, 200 or more, 500 or more, 1000 or more, or even10,000 or more nucleic acid molecules.

[0087] In addition, statistical sampling can enhance for the highestcomplexity binding for a given oligonucleotide, for example, byselecting the above average ranked oligonucleotides that arecomplementary to above the average number of nucleic acid molecules. Theoligonucleotides can be selected for the any range of complexitybinding, for example, the top 10% of highest ranked complexity binding,the top 20% of highest ranked complexity binding, or the top 50% ofhighest ranked complexity binding.

[0088] Furthermore, statistical selection can be used to excludeundesirable nucleotide sequences, including conserved sequences in afamily of related nucleic acid molecules (WO 99/11823). A statisticaloligonucleotide can be about 5 nucleotides in length to about 1000nucleotides in length, for example, about 5, 6, 7, 8, 9, 10, 11, 12, 13,14, 15, 16, 18, 20, 25, 30 or 50 nucleotides in length. A set ofstatistical primers can contain degenerate bases, for example, more thanone nucleotide at any given position.

[0089] A sampled nucleic acid molecule obtained using a preselectedportion of a nucleotide sequence is specifically excluded from themeaning of the term “statistically sampled nucleic acid molecule.” Forexample, if a portion of a known nucleotide sequence is identified andan oligonucleotide that matches the identified portion is generated tosample a nucleic acid molecule, such a sampled nucleic acid moleculewould not be a statistically sampled nucleic acid molecule. However, ifa group of oligonucleotides is first identified and then compared to twoor more known nucleotide sequences in a population of nucleic acidmolecules to determine oligonucleotides statistically present in orsimilar to the known nucleotide sequences, such statistically identifiedoligonucleotides can be used to obtain a statistically sampled nucleicacid molecule. Methods for generating a target containing statisticallysampled nucleic acid molecules are disclosed herein.

[0090] A statistically sampled target containing statistically samplednucleic acid molecules can be generated using one or more statisticaloligonucleotides. As used herein, the term “statistical oligonucleotide”means that an oligonucleotide is a sequence that is selected based onits statistical occurrence of complementarity in more than one knownnucleic acid molecule. As such, a statistical oligonucleotide can beused to statistically sample a population of nucleic acid molecules.

[0091] The methods of the invention detect specific binding of a targetto a probe. A target can be generated, for example, by amplifyingnucleic acid molecules. As used herein, the term “amplified target”refers to a target generated by enzymatically copying a nucleic acidmolecule to generate more than one copy of the nucleic acid molecules ina population of nucleic acid molecules. An amplified nucleic acid targetcan be generated, for example, using an amplification method such aspolymerase chain reaction (PCR). A target having a single copy of eachnucleic acid molecule in a target sample from which the target sample isderived, which would have identical abundance and complexity as theoriginal population, would not be considered an amplified target. Anamplified target can be useful, for example, if nucleic acid moleculessampled by the probe are in limited quantities in the target. A nucleicacid molecule that is to be sampled and which is present in very lowquantities would be difficult to detect without amplification andincreasing the mass of the nucleic acid molecules in the probe. However,a limited complexity target, in which the complexity or number ofdifferent molecules is limited, need not be amplified.

[0092] Other methods for generating an amplified target include, forexample, the ligase chain reaction (LCR); self-sustained sequencereplication (3SR); beta replicase reaction, for example, using Q-betareplicase; phage terminal binding protein reaction; strand displacementamplification (SDA); nucleic acid sequence based amplification (NASBA);cooperative amplification by cross hybridization (CATCH); rolling circleamplification (RCA) and AFLP (Trippler et al., J. Viral. Hepat. 3:267(1996); Hofler et al., Lab. Invest. 73:577 (1995); Tyagi et al., Proc.Natl. Acad. Sci. USA 93:5395 (1996); Blanco et al., Proc. Natl. Acad.Sci. USA 91:12198 (1994); Spears et al., Anal. Biochem. 247:130 (1997);Spargo et al., Mol. Cell. Probes 10:247 (1996); Gobbers et al., J.Virol. Methods 66:293 (1997); Uyttendaele et al., Int. J. FoodMicrobiol. 37:13 (1997); and Leone et al., J. Virol. Methods 66:19(1997); Ellinger et al., Chem. Biol. 5:729-741 (1998); Ehricht et al.,Nucleic Acids Res. 25:4697-4699 (1997); Ehricht et al., Eur. J. Biochem.243:358-364 (1997); Lizardi et al., Nat. Genet. 19:225-232 (1998)).

[0093] The methods of the invention are useful for measuring the levelof two or more nucleic acid molecules in a target. The methods of theinvention can also be used to compare expression levels between twotargets. In particular, the methods of the invention are useful formeasuring differential expression of nucleic acid molecules (see below).

[0094] A total target, using the full complexity of the mRNA populationfor target preparation, can easily examine the top few hundred or a fewthousand of the mRNAs in the cell (Pietu et al., Genome Res. 6:492-503(1996)). However, a total labeled cDNA target from a mammalian celltypically has a complexity of over 100 million bases which complicatesattempts to detect differential expression among the rarer mRNAs usingdifferential hybridization. Recent advances in the use of fluorescenceand confocal microscopy have led to improvements in the sensitivity anddynamic range of differential hybridization methods, with a dynamicrange of detection of 10,000-fold and the detection of transcripts at asensitivity approaching 1/500,000 (Marshall and Hodgson, Nat.Biotechnol. 16:27-31 (1998); Ramsay, Nat. Biotechnol. 16:40-44 (1998)).Despite the improvements in sensitivity, methods using total targetremain biased toward more abundant mRNAs in a sample.

[0095] The standard method for differential screening, which typicallyuses targets derived from reverse transcription of total message andautoradiography or phosphoimaging, can be used to detect differentialexpression (Pietu, supra, 1996). However, the method is limited to themost abundant messages. Only abundant transcripts are represented highlyenough to yield effective targets with a sensitivity of perhaps 1/15,000(Boll, Gene 50:41-53 (1986)). As disclosed herein, differentialscreening can be improved greatly by reducing the complexity of thetarget and by systematically increasing the amount of rarer nucleic acidmolecules in the target. By enhancing the amount of less abundantnucleic acids in a target, differential screening is not confined toonly the most abundant nucleic acid molecules, as observed using totaltarget.

[0096] By reducing the complexity of the target, the ability to identifyall mRNA species in a source simultaneously is sacrificed for improvedkinetics and an improved signal to noise ratio. Complexity reductionmethods generate a target having a subset of nucleic acid molecules in apopulation that allow a few rare mRNAs to contribute significantly tothe final mass of the target, thereby enhancing the ability to observedifferential gene expression among rare mRNAs in a source. Any methodthat generates a mixture of products that reliably enriches for onlypart of each mRNA or only a subset of the mRNA population is useful forgenerating a reduced complexity target.

[0097] There are two fundamentally different types of complexityreduction methods, methods that maintain the relative stoichiometryamong the mRNAs they sample and methods that do not maintainstoichiometry. One class of methods yields nucleic acids representing asubset of the mRNA population and maintains the approximatestoichiometry of the input RNA. Such methods are exemplified by mostamplified restriction fragment length polymorphism (AFLP) andrestriction strategies that sample the 3′ end or internal fragments ofmRNAs (Habu et al., Biochem. Biophys. Res. Commun. 234:516-521 (1997);Money et al., Nucleic Acids Res. 24:2616-2617 (1996); Bachem et al.,Plant J. 9:745-753 (1996)). Another example is the use of sizefractionated mRNAs to generate cDNA targets. All the mRNAs, for example,the 2.0 to 2.1 kb range can be used as a reduced complexity target.Stoichiometry among these mRNAs would be mostly preserved in the target(Dittmar et al., Cell Biol. Int. 21:383-391 (1997)).

[0098] A second class of methods for generating reduced complexitytargets does not preserve the stoichiometry of the starting mRNAs,though it does preserve differences among individual RNAs between targetsamples from which targets are made. One method to generate a reducedcomplexity target that does not maintain stoichiometry is to usesubtracted targets, which have shown sensitivity for rare messagescomparable to chips, in particular methods based on representationaldifference analysis or suppression subtractive hybridization (Rhyner etal., J. Neurosci. Res. 16:167-181 (1986); Lisitsyn et al., Science259:946-951 (1993); Lisitsyn & Wigler, Methods Enzymol. 254:291-304(1995); Jin et al., Biotechniques 23:1084-1086 (1997)).

[0099] Particularly useful methods for generating a reduced complexitytarget that does not maintain stoichiometry are exemplified by usingarbitrarily sampled targets or statistically sampled targets. Methodsusing arbitrarily sampled targets and statistically sampled targets aredisclosed herein. The methods using arbitrarily sampled or statisticallysampled targets allow detection of low abundance nucleic acid moleculesin a target. The methods of the invention are advantageous because theyenhance the ability to detect low abundance nucleic acid molecules in atarget and also allow detection of nucleic acid molecules in a targetderived from limited quantities of nucleic acid molecules, such as a fewcells or even a single cell.

[0100] An arbitrarily sampled target or statistically sampled target canbe generated, for example, by amplification. If an amplified target isgenerated using arbitrary oligonucleotides or statisticaloligonucleotides, the amplified products reflect a function of both thestarting abundance of each target nucleic acid molecule and the qualityof the match of the oligonucleotide to the target nucleic acid moleculeto be sampled. Thus, the final mixture of amplified products can includequite abundant amplified products that derive from low abundance nucleicacid molecules that have a good match with the oligonucleotide primersused and have favorable “amplifiability” after the initial primingevents. Amplifiability includes effects such as secondary structure andproduct size.

[0101] A consequence of generating an amplified target using arbitraryoligonucleotides or statistical oligonucleotides is that the samenucleic acid molecules in two different targets experience an identicalcombination of primability and amplifiability so that changes inabundance for particular mRNAs are maintained, even as the relativeabundances between different nucleic acid molecules within one targetare profoundly changed. This is in contrast to methods that maintainstoichiometry, where less abundant nucleic acid molecules would bepresent as less abundant nucleic acid molecules in the target.

[0102] When generating an amplified target, there are generally noparticular constraints on the oligonucleotide primers. Theoligonucleotide primers preferably contain at least a few C or G bases.The oligonucleotide primers also preferably do not contain 3′ endscomplementary with themselves or the other primer in the reaction, toavoid primer dimers. The oligonucleotide primers are also preferablychosen to have different sequences so that the same parts of mRNA arenot amplified in different fingerprints.

[0103] As disclosed herein, methods of generating arbitrarily sampledtargets or statistically sampled targets can be based on methods thathave been traditionally used to “fingerprint” a target sample containingnucleic acid molecules. The fingerprints are characteristic of theexpression of nucleic acid molecules in a target sample. To generate anarbitrarily sampled target, one method that can be used is based on RNAarbitrarily primed PCR (RAP-PCR) (see Examples I and II; Welsh et al.,Nucleic Acids Res. 18:7213-7218 (1990); Welsh et al., Nucleic Acids Res.20:4965-4970 (1992); Liang and Pardee, Science 257:967-971 (1992)).

[0104] In RAP-PCR, both the abundance and the extent of match with theprimers contribute to the prevalence of any particular product. Thus,rare mRNAs that happen to have excellent matches with the primers andare efficiently amplified are found among the more abundant RAP-PCRproducts, which makes a target generated by RAP-PCR non-stoichiometric.This is a very useful feature of RAP-PCR because it allows the samplingof mRNAs that are difficult to sample using other methods.

[0105] In a typical RAP-PCR fingerprint, about 50-100 cDNA fragments perlane are visible on a polyacrylamide gel, including products fromrelatively rare mRNAs that happen to have among the best matches withthe arbitrary primers. If only 100 cDNA clones could be detected in anarray by each target, then hybridization to arrays would be inefficient.However, RAP-PCR fingerprints contain many products that are too rare tovisualize by autoradiography of a polyacrylamide gel. Nonetheless, theserarer products are reproducible and of sufficient abundance to serve astarget for arrays when labeled at high specific activity.

[0106] As disclosed herein, a single target derived from RAP-PCR candetect about a thousand cDNAs on an array containing about 18,000 ESTclones, a 10-20 fold improvement over the performance of fingerprintsdisplayed on denaturing polyacrylamide gels. In addition, when adifferentially regulated gene is detected on a cDNA array, a clonerepresenting the transcript is immediately available, and often sequenceinformation for the clone is also available. Furthermore, the clones areusually much longer than the usual RAP-PCR product. In contrast, thestandard approaches to RNA fingerprinting require that the product begel purified and sequenced before verification of differentialexpression can be performed. As disclosed herein, differentiallyamplified RAP-PCR products that are below the detection capabilities ofthe standard denaturing polyacrylamide gel and autoradiography methodscan be detected using hybridization to cDNA arrays.

[0107] An arbitrarily sampled target generated by RAP-PCR can sample thetop few thousand highest expressed nucleic acid molecules in a targetsample and can sample different subsets of the nucleic acid molecules ina population, depending on the oligonucleotide primers used foramplification. Some of the rare nucleic acid molecules in a target aresufficiently represented to be easily detected on arrays of colonies(see Examples I and II).

[0108] To generate an arbitrarily sampled target using RAP-PCR, theRAP-PCR fingerprint is made by arbitrarily primed reverse transcriptionand PCR of nucleic acid molecules in a target sample, for example,messenger RNA (McClelland et al., in Differential Display Methods andProtocols, Liang and Pardee, eds., Humana Press (1997)). Alternatively,first strand cDNA can be primed with oligo dT or with random shortoligomers, followed by arbitrary priming. Analysis of such a RAP-PCR“fingerprint” by gel electrophoresis reveals a complex fingerprintshowing relative abundances of an arbitrary sample of about 100transcripts (see Example II).

[0109] As disclosed herein, RAP-PCR fingerprints were converted totargets- to probe or hybridize human cDNA clones arrayed as E. colicolonies on nylon membranes (Example II). Each array contained 18,432cDNA clones from the Integrated Molecular Analysis of Genomes and theirExpression (I.M.A.G.E.) consortium. Hybridization to about 1000 cDNAclones was detected using each arbitrarily sampled target generated byRAP-PCR. Different RAP-PCR fingerprints gave hybridization patternshaving very little overlap (<3%) with each other, or with hybridizationpatterns from total cDNA targets. Consequently, repeated application ofRAP-PCR targets allows a greater fraction of the message population tobe screened on this type of array than can be achieved with aradiolabeled total cDNA target.

[0110] The arbitrarily sampled targets were generated from HaCaTkeratinocytes treated with EGF. Two RAP-PCR targets hybridized to 2000clones, from which 22 candidate differentially expressed genes wereobserved (Example II). Differential expression was tested for 15 ofthese clones using RT-PCR and 13 were confirmed. The use of this cDNAarray to analyze RAP-PCR fingerprints allowed for an increase indetection of 10- to 20-fold over the conventional denaturingpolyacrylamide gel approach to RAP-PCR or differential display.Throughput is vastly improved by the reduction in cloning and sequencingafforded by the use of arrays. Also, repeated cloning and sequencing ofthe same gene, or of genes already known to be regulated in the systemof interest, is minimized.

[0111] The use of RAP-PCR to generate an arbitrarily sampled target isparticularly useful because it allows very high throughput discovery ofdifferentially regulated genes (see Examples II and III). The throughputusing this method is about 20 times faster. Essentially, once a RAP-PCRfingerprint has been generated, instead of analyzing the product by gelelectrophoresis, the RAP-PCR fingerprint is used as a target to probe orhybridize to nucleic acid molecules. Such an arbitrarily sampled targetgenerated by RAP-PCR is particularly useful as a target for an array.

[0112] Parameters of the RAP-PCR reaction can be varied, for example, tooptimize complexity of the target and enhance complexity binding. Forexample, to increase the complexity, Taq polymerase Stoffel fragment,which is more promiscuous than AMPLITAQ, can be used for amplification.The oligonucleotide primers used herein (Example II) were 10 or 11 basesin length and were not degenerate, having a single base at eachposition. Longer oligonucleotide primers used at the same temperaturecan give a more complex product, as would primers with some degeneracy.However, the greater the complexity of the target, the more closely itwill resemble a total mRNA target, which loses the advantage ofnon-stoichiometric sampling. To further vary RAP-PCR parameters, theoligonucleotide primer length, degeneracy, and 3′ anchoring can bevaried in the reverse transcription and PCR reactions. Various differentpolymerases can also be used.

[0113] The RAP-PCR fingerprint can be radiolabeled or labeled withfluorescent dyes, as described below, and used as a target to probeagainst dense arrays such as arrays of cDNA clones. Differences in thelevel of nucleic acid molecules between two targets can indicate, forexample, differences in mRNA transcript levels, which usually reflectsdifferences in gene expression levels. Differences in expression canalso reflect degradation or post-translational processsing. Using anarbitrarily sampled target, each target is estimated to allow thedetection of roughly 10% of the total complexity of the messagepopulation, and most importantly, this 10% very effectively includes therare message class. The rare message class is included in the targetbecause, while RAP-PCR reflects message abundance between targetsamples, the cDNAs selected for amplification in any particular RAP-PCRreaction is determined by sequence rather than abundance. When thesequence match between oligonucleotide primers and nucleic acidmolecules is very good, even if the nucleic acid molecule is in lowabundance, the low abundance nucleic acid molecules have a good chanceof having a larger amount of the less abundant nucleic acid moleculerelative to more abundant nucleic acid molecules in the final target.

[0114] To be suitable for either gel- or array-based analysis, RAP-PCRfingerprints should remain almost identical over an eight-fold dilutionof the input RNA. Low quality RAP-PCR fingerprints are usually theconsequence of poor control over RNA quality and concentration. Beforeproceeding with the array hybridization steps, the quality of theRAP-PCR products can be verified. Because the array method has such highthroughput, this extra step is neither costly, nor time-consuming, andcan greatly improve efficiency by reducing the number of false positivesdue to poor fingerprint reproducibility. The reproducibility of RAP-PCRfingerprints as targets is exemplified herein (see Example II).

[0115] The enhanced ability of the methods of the invention to detectlow abundance nucleic acid molecules in a target sample provides a majorimprovement over previously used methods that have limited ability todetect rare messages. It is likely that the entire complexity of themessage population of a cell could be examined in a short period oftime, for example, in a few weeks.

[0116] For example, as disclosed in Example II, targets generated byRAP-PCR sample a population of mRNAs largely independent of messageabundance. This is because the low abundance class of messages has muchhigher complexity than the abundant class, making it more likely thatthe arbitrary primers will find good matches. Unlike differentialdisplay, RAP-PCR demands two arbitrary priming events, possibly biasingRAP-PCR toward the complex class. It is likely that the majority of themRNA population in a cell (<20,000 mRNAs) can be found in as few as tenRAP-PCR fingerprints.

[0117] In addition to using RAP-PCR, differential display can also beused to generate an arbitrarily sampled target (see Example III). Fordifferential display, first, reverse transcription uses a 3′ anchoredprimer such as an oligo(dT) primer. Next, second strand cDNA is primedwith an arbitrary primer. Then PCR takes place between the arbitraryprimer and the 3′ anchor.

[0118] As disclosed in Example III, a combination of one arbitrary andone oligo(dT) anchor primer was used to generate an arbitrarily sampledtarget for cDNA arrays. Both the RAP-PCR and differential displayapproaches to target preparation can use less than {fraction (1/200)}thof the amount of RNA used in some other array hybridization methods.Each fingerprint detected about 5-10% of the transcribed mRNAs, sampledalmost independent of abundance, using inexpensive E. coli colony arraysof EST clones. The differential display protocol was modified togenerate a sufficient mass of PCR products for use as a target to probenucleic acid molecules. The use of different oligo(dT) anchor primerswith the same arbitrary primer resulted in considerable overlap amongthe genes sampled by each target. Overlap of sampled genes can beavoided by using different arbitrary primers with each oligo(dT) anchorprimer. Four genes not previously known to be regulated by EGF and threegenes known to be regulated by EGF in other cell types werecharacterized using the arbitrarily sampled targets generated bydifferential display. The use of arbitrarily sampled targets generatedby differential display is particularly useful for identification ofdifferentially regulated genes.

[0119] A very large number of fingerprints that have been previouslygenerated can be converted to effective targets to be probed by nucleicacid molecule arrays if the mass is increased by performing PCR on analiquot of each fingerprint in the presence of sufficient dNTPs (100 μM)and primers (about 1 μM). Fingerprints can be reamplified, as previouslyshown (Ralph et al. Proc. Natl. Acad. Sci. USA 90:10710-10714 (1993)).Thus, previously determined differential display samples can be used togenerate targets to probe arrays, allowing additional information to beobtained.

[0120] As disclosed herein, differential display was used to generatetargets based on the method of Liang and Pardee (supra, 1992). The useof targets derived from oligo(dT) anchoring has some potentialadvantages for certain types of arrays. For example, some arrays aregenerated by oligo(dT) primed reverse transcription, and these clonesare 3′ biased. A target generated by an oligo(dT) anchored primer and anarbitrary primer should also be 3′ biased so that each PCR product canhybridize to the corresponding 3′ biased clone. In contrast, a targetgenerated using arbitrary priming can sample regions internal to mRNAs.If the arbitrary product is located further 5′ in the mRNA than the 3′truncated clone, the target cannot bind to the corresponding mRNA.

[0121] Arbitrarily sampled targets generated using differential displaywith 3′ anchored oligonucleotide primers are particularly useful forprobing 3′ biased libraries and, in particular, 3′ biased ESTs. 3′anchoring is not useful for sampling RNAs that do not have poly(A)tails, such as most bacterial RNAs. Targets generated using 3′ anchorprimers would also not be suitable for PCR arrays based on internalproducts. 3′ biased targets are also less useful for random primedlibraries.

[0122] Other methods for generating an arbitrarily sampled target canalso be used. One such method is a variant of RAP-PCR, called complexitylimited arbitrary sample sequencing (CLASS). CLASS was conceived as asolution to a well known and frustrating limitation of Serial Analysisof Gene Expression (SAGE) (Velculescu et al., Science 270:484-487(1995)). SAGE is a method for generating small pieces of cDNA from twosources, linking them together, and sequencing them in large numbers.The average cell contains 200,000 mRNA transcripts, representing about20,000 different sequences, and SAGE allows sequencing of about 40 atone time. Therefore, to compare two targets using a standard sequencingapparatus, a very large number of sequencing gels, about 100, would berequired to obtain information on 400,000 mRNAs, representing 200,000mRNAs from two populations being compared. Although the method is usefulfor obtaining information on expression of nucleic acid molecules, eachadditional RNA sample increases the number of gels needed by 50, whichis very expensive and time consuming. The main problem is that all 100gels have to be run to have confidence in the statistics on raremessages that have changed in expression from 1 to 10 copies per cell.

[0123] To solve this problem, CLASS was devised. CLASS is similar toRAP-PCR except that the oligonucleotide primers used have degenerate 3′ends. The degeneracy causes the primers to prime often, generating shortsequence tags. By choosing a short PCR extension time, the predominantproducts come only from a fraction of the total complexity of the mRNA,and the size of this fraction can be adjusted at will by varying thenumber of 3+ degenerate bases. These short tags can then be concatenatedand sequenced, rapidly yielding reliable statistics on a subsample ofthe message complexity, similar to the ligation and sequencing strategyused in SAGE (Valculescu et al., supra, 1995). The CLASS products canalso be used as a target to probe, for example, against arrays.

[0124] The CLASS method is advantageous because additional sets ofprimers having degenerate 3′ ends can be generated and used to obtain adifferent sampling of nucleic acid molecules. This iterative approach todetermining nucleic acid molecule expression provides more informationabout a pattern of expression in a source of nucleic acid molecules thanthe holistic approach of SAGE (Velculescu et al., supra, 1995).

[0125] In contrast to SAGE, which requires nearly complete sequencing ofthe 100 gels to be certain of any of the rare messages, CLASS allowsnucleic acid molecule populations to be partitioned into small groups sothat, with 10% of the work, confidence is generated for the results of10% of all of the genes in the cell. With one round of CLASS, noinformation is obtained on 90% of the rare messages in the first pass(10 gels), but there is high confidence in the results for 10% of thenucleic acid molecules in a target sample. The high confidence in 10% ofthe genes is preferable because, when hunting for differentiallyregulated genes, it is expected that a pattern or “type of behavior”occurs during differential gene regulation. It is seldom, if ever, thata single gene is activated without the coordinate regulation of otherscontrolled by the same pathway. Thus, if one is seeking any one of 10low abundance transcripts regulated, for example, by a topoisomeraseinhibitor, SAGE would require running 100 sequencing gels that wouldyield all 10 low abundance genes. In contrast, CLASS allows running 10gels, in one-tenth the time, to identify at least one gene, which can besufficient to identify a pattern of gene expression. Furthermore, CLASScan be used iteratively using different primers to run additional gels,for example, 50 gels, to get information on five times as many genes,whereas running 50 gels with SAGE would reveal no statistically relevantinformation. Therefore, CLASS is a much more economic approach toidentifying a gene expression pattern.

[0126] CLASS can be applied to any species, even those for which arraysare unavailable, and to mRNAs that have not yet been deposited onarrays. Thus, whereas use of targets generated by RAP-PCR on knownarrays gives expression information on known genes, CLASS givesexpression information on any gene, even if not previously encounteredin libraries that have been arrayed. CLASS thus provides a low cost,relatively high throughput method for obtaining information on geneexpression.

[0127] The invention also provides methods of measuring the level ofnucleic acid molecules in a target using a statistically sampled target.Methods useful for generating a statistically sampled target have beenpreviously described (WO 99/11823; McClelland et al., supra, 1997;Pesole et al., Biotechniques 25:112-123 (1998); Lopez-Nieto and Nigam,Nature Biotechnology 14:857-861 (1996)). An exemplary method forgenerating a statistically sampled target is statistically primed PCR(SP-PCR). The main difference between a statistical priming method andRAP-PCR is that the primers are selected by a computer program todetermine the statistical occurrence of a nucleotide sequence in a groupof nucleic acid molecules, rather than selecting primers arbitrarily.

[0128] A method for generating a statistically sampled target can be adirected statistical selection. For example, a program called GeneUP hasbeen devised that uses an algorithm to select primer pairs to samplesequences in a list of interest, for example, a list of human mRNAassociated with apoptosis, while excluding sequences in another list,for example, a list of abundantly expressed mRNA in human cells andstructural RNAs such as rRNAs, Alu repeats and mtDNA (Pesole et al.,supra, 1998). A directed statistical method provides a systematicdetermination of whether any given oligonucleotide matches any givennucleotide sequence and the number of different nucleic acid moleculesto which a given oligonucleotide can bind. Such a directed statisticalmethod can be used to generate a statistically sampled target useful inthe invention.

[0129] Another method for generating a statistically sampled target is aMonte-Carlo statistical selection method (Lopez-Nieto and Nigam, supra,1996). A Monte-Carlo statistical selection method randomly pairs a setof primers using a Monte-Carlo method. A Monte-Carlo method approximatesthe solution of determining primers that can be used for amplificationby simulating a random process of primer matching. A Monte-Carlostatistical method differs from a directed statistical method in that adirected statistical method provides a systematic determination ofwhether any given oligonucleotide matches any given nucleotide sequenceand the number of different nucleic acid molecules to which a givenoligonucleotide can bind.

[0130] In general, two arbitrarily sampled targets, generated usingdifferent pairs of arbitrary oligonucleotides, will hybridize to largelynon-overlapping sets of nucleic acid molecules in a target sample.Similarly, two statistically sampled targets, generated using differentpairs of statistical oligonucleotides, will hybridize to largelynon-overlapping sets of nucleic acid molecules in a target. Generally,fewer than 100 products overlap among the most intensely hybridizing2000 colonies in two differently primed reduced complexity target (seeExample I). The pattern of expression is also almost entirely differentfrom the pattern generated by directly labeling the whole mRNApopulation. However, as more nucleic acid molecules are sampled byadditional arbitrary sampling of the RNA population or additionalstatistic sampling of the RNA population, the number of non-overlappingnucleic acid molecules sampled will decrease. To some extent, theefficiency of coverage of nucleic acid molecules can be improved by theuse of statistically selected primers (Pesole et al., supra, 1998).Multiple arbitrarily sampled targets generated by RAP-PCR could supplysufficient targets to cover all genes.

[0131] The methods described above for generating arbitrarily sampledtargets and statistically sampled targets can be modified. For example,a subtraction strategy can be used to generate arbitrarily sampledtargets or statistically sampled targets enriched for differentiallyregulated nucleic acids. A target from one source of nucleic acidmolecules (A) is labeled, then mixed with a few-fold excess of unlabeledtarget from the other source (B). The whole mixture is denatured andadded to the hybridization solution for binding to the probe. Theamplified nucleic acid products present in both targets form doublestranded nucleic acid molecules, and the remaining available labeledtarget is primarily from the differences between the two targets. Thesame experiment can be-done with labeled target from source (B) andexcess unlabeled target from source (A). The probe bound to both sets ofsubtracted targets are compared to detect differential gene expression.This procedure also partly quenches repeats present in the target cDNAmixtures. The use of such a subtraction method to generate anarbitrarily sampled target or statistically sampled target can thus beused to compare two conditions by using an unlabeled target from onecondition to quench the labeled target from another condition.

[0132] A limitation of subtraction is that it can eliminate smalldifferences in expression that can appear to be total absence of a mRNA.Furthermore, while subtraction is useful in a binary question, it is oflimited utility in cases where a large number of conditions are to becompared, combinatorially.

[0133] Detection of specific binding is limited by backgroundhybridization and incomplete blockage of repeats. Therefore, in additionto using the methods described above for generating reduced complexitytargets, Cot₁ DNA can be used to quench nucleic acid repetitiveelements. A Cot₁ DNA genomic fraction is enriched in repeats. A targetthat contains Cot₁ DNA is useful for looking at low abundance nucleicacid molecules that can be difficult to detect. Although low abundancesequences can be partly quenched by the use of total genomic DNA, Cot₁DNA is useful for the more sophisticated arrays such as PCR-basedarrays, where the signal to noise ratio is sufficiently high to beconcerned about relatively poorly amplified products.

[0134] When generating an arbitrarily sampled target or a statisticallysampled target, various promoters such as T7 polymerase, T3 polymerase,SP6 polymerase or others can be incorporated into a primer so thattranscription with the corresponding polymerase is used to generate thetarget. Using transcription to generate the target has the advantage ofgenerating a single stranded target. A primer comprising an RNApolymerase promoter can be used in combination with any otherstatistical or arbitrary primer.

[0135] An arbitrarily sampled target or a statistically sampled targetcan also be generated using digestion ligation. In this case, apopulation of nucleic acid molecules used to generate the target isdigested with a restriction enzyme and an oligonucleotide primer isligated to generate an amplified target. Ligation-mediated PCR is wherea primer binding site or part of the primer binding site is placed on atemplate by ligation, for example, after site-specific cleavage.

[0136] Nested PCR can also be used to generate an arbitrarily sampledtarget or statistically sampled target. Nested PCR involves two PCRsteps, with a first round of PCR performed using a first primer followedby PCR with a second primer that differs from the first primer in thatit includes a sequence that extends one or more nucleotides beyond thefirst primer sequence.

[0137] Targets can be enriched for those that hybridize to a particularprobe. Once a target generated by a particular arbitrary orstatistically primed method has been used on a particular nucleic acidpopulation and the resulting target used against a set of probes, thenthe set of targets that are detectably hybridized will be known. At thatpoint it is possible to devise a new set of targets that includes onlythose that were detected or mostly those that were detected by thatprobe. For example, if a particular primer “A” is used for RAP-PCR usingRNA from the human brain and the resulting target is hybridized to anarray of cDNA clones, some of the clones will be detectably hybridized.It is then possible to make an array of only those probes that werehybridized by that particular target. Most of the cDNAs on the array canbe expected to hybridize with a target developed from human brain RNAmade with the same primer “A”.

[0138] In some cases, the sequences of the nucleic acids that are thebasis of targets are known. Some targets hybridize detectably with aparticular probe and others do not. The sequence information associatedwith the targets can be used to deduce the rules of arbitrary orstatistical priming events that resulted in the target that hybridizedto those probes. Such information will help to predict what sequencesare likely to be sampled by a particular primer if that sequence occursin the target. Such information can improve the estimates of whichsequences are sampled efficiently and which sequences are sampledefficiently by a particular primer.

[0139] The methods of the invention are particularly useful formeasuring the level of a molecule in a target using an array. As usedherein, the term “array” or “array of molecules” refers to a pluralityof molecules stably bound to a solid support. An array can comprise, forexample, nucleic acid, oligonucleotide or polypeptide-nucleic acidmolecules. It is understood that, as used herein, an array of moleculesspecifically excludes molecules that have been resolvedelectrophoretically prior to binding to a solid support and, as such,excludes Southern blots, Northern blots and Western blots of DNA, RNAand proteins, respectively.

[0140] As used herein, the term “non-dot blot” array refers to an arrayin which-the molecules of the array are attached to the solid support bya means other than vacuum filtration or spotting onto a nitrocelluloseor nylon membrane in a configuration of at least about 2 spots per cm².

[0141] As used herein, the term “peptide-nucleic acid” or “PNA” refersto a peptide and nucleic acid molecule covalently bound (Nielson,Current Opin. Biotechnol. 10:71-75 (1999)).

[0142] As used herein, the term “polypeptide,” when used in reference toPNA, means a peptide, polypeptide or protein of two or more amino acids.The term is similarly intended to refer to derivatives, analogues andfunctional mimetics thereof. For example, derivatives can includechemical modifications of the polypeptide such as alkylation, acylation,carbamylation, iodination, or any modification which derivatizes thepolypeptide. Analogues can include modified amino acids, for example,hydroxyproline or carboxyglutamate, and can include amino acids that arenot linked by peptide bonds. Mimetics encompass chemicals containingchemical moieties that mimic the function of the polypeptide regardlessof the predicted three-dimensional structure of the compound. Forexample, if a polypeptide contains two charged chemical moieties in afunctional domain, a mimetic places two charged chemical moieties in aspatial orientation and constrained structure so that the chargedchemical function is maintained in three-dimensional space. Thus, all ofthese modifications are included within the term “polypeptide.”

[0143] The solid support for the arrays can be nylon membranes, glass,derivatized glass, silicon or other substrates. The arrays can be flatsurfaces such as membranes or can be spheres or beads, if desired. Themolecules can be attached as “spots” on the solid support and generallycan be spotted at a density of at least about 5/cm² or 10/cm², butgenerally does not exceed about 1000/cm².

[0144] Various methods to manufacture arrays of DNA molecules have beendescribed (reviewed in Ramsay, supra, 1998; Marshall and Hodgson, supra,1998). Arrays are available containing nucleic acid molecules fromvarious species, including yeast, mouse and human. The use of arrays isadvantageous because differential expression of many genes can bedetermined in parallel.

[0145] One type of array contains thousands of PCR products per squarecentimeter. Arrays of PCR products from segments of mRNAs have beenattached to glass, for example, and probed using cDNA populations fromtwo sources. Each cDNA or cRNA population is labeled with a differentfluorescent dye and hybridization is assessed using fluorescence (DeRisiet al., Nature Genet. 14:457-460 (1996); Schena et al., Science270:467-470 (1995)). Arrays are also available containing over 5000 PCRproducts from selected I.M.A.G.E. clones. An array of PCR products alsois available for every yeast ORF and for a subset of human ESTs.

[0146] Another type of array contains colonies of 18,432 E. coli clones,each carrying a different I.M.A.G.E. EST plasmid, and each spotted twiceon a 22×22 cm membrane (Genome Systems). One advantage of using thearrays from the I.M.A.G.E. consortium is that more than 80% of theclones have single pass sequence reads from the 5′ or 3′ end, or both,deposited in the GenBank database. Thus, it is usually not necessary toclone or sequence any DNA to determine if there is a known gene or otherESTs that share the same sequence. UniGene clustering of human and mouseESTs that appear to be from the same gene greatly aids in this process(http://www.ncbi.nlm.nih.gov/UniGene/index.html). Mapping ontochromosomes at a resolution of a few centiMorgans is also available formost of these clusters at the same web site. The clones on these arraysare all available to be used to probe nucleic acid molecules or tocomplete the sequencing (www-bio.llnl.gov). It is often possible toidentify a close homolog in other species. In contrast to PCR productarrays and oligonucleotide arrays, which are free of other DNAs, eachspotted EST is associated with E. coli genomic DNA from the host. Thus,the clone arrays can have higher background than PCR arrays oroligonucleotide arrays.

[0147] If EST arrays are used, 5′ RACE can be used to extend beyond theESTs currently available (Zhang and Frohman, Methods Mol. Biol. 69:61-87(1997)). When cDNA libraries that contain near full length clones areavailable and end sequenced, it will be possible to go from adifferentially hybridized spot to a full length cDNA, directly.

[0148] Another class of arrays uses oligonucleotides that are eitherattached to a glass or silicon surface or manufactured by sequentialphotochemistry on the DNA chip (Chee et al., Science 274:610-614(1996)). Such chips can contain tens of thousands of differentoligonucleotide sequences per square centimeter. Arrays ofoligonucleotide nucleic acid analogs such as peptide-nucleic acids, forexample, can be prepared (Weiler et al., Nucleic Acids Res. 25:2792-2799(1997)).

[0149] Hybridization of fingerprints to arrays has the huge advantagethat there is generally no need to isolate, clone, and sequence thegenes detected. In principle, all known human mRNAs will fit on threemembranes (about 50,000 genes), or in a smaller area on glass arrays orother solid supports. At present, each fingerprint has a sufficientcomplexity to hybridize to over 2000 of the 50,000 known genes.

[0150] The use of arrays, which can have thousands of genes that canbind to a target, particular genes for further characterization can beselected based on desired criteria. For example, identified genes can bechosen that are already known and for which a new role in the conditionof interest can be deduced. Alternatively, some of the genes can befamily members of known genes with known functions for which a plausiblerole can be determined.

[0151] In addition to arrays, a number of cDNA libraries are available,for example, from the I.M.A.G.E. consortium(www-bio.llnl.gov/bbrp/image/image.html), including libraries availableon nylon membranes, for example, from Research Genetics (HuntsvilleAla.; www.resgen.com), Genome Systems (St. Louis Mo.;www.genomesystems.com), and the German Human Genome Project(www.rzpd.de). These libraries include clones from various humantissues, stages of development, disease states and other sources.

[0152] The methods of the invention include the step of detecting theamount of specific binding of the probe to the target. As disclosedherein, a variety of detection methods can be used. For example, if adetectable moiety is a radioactive moiety, the method of detection canbe autoradiography or phosphoimaging. Phosphoimaging is advantageous forquantitation and shortened data collection time. If a detectable moietyis a fluorescent moiety, the method of detection can be fluorescencespectroscopy or confocal microscopy.

[0153] The methods of the invention use nucleic acid probes to measurethe level of expression of a nucleic acid molecule in a target. If aradioactive moiety is attached to a target, for example, incorporationof the radioactive moiety can be by any enzymatic or chemical methodthat allows attachment of the radioactive moiety. For example,end-labeling can be used to attach a radioactive moiety to the end of anucleic acid molecule. Alternatively, a radioactive nucleotide, inparticular a ³²P-, ³³P-, or ³⁵S-labeled nucleotide, can be incorporatedinto the nucleic acid molecule during synthesis. The use of randomprimed synthesis is particularly useful for generating a high specificactivity target. Generally, random primed synthesis generatesapproximately equal amounts of randomly primed nucleic acid moleculesfrom both strands of double stranded PCR products, which will re-annealto some degree during hybridization to the target (see Example I). Ifdesired, the amount of re-annealing can be limited, for example, usingexoII digestion.

[0154] When generating a labeled target or probe, it is generallypreferable to incorporate a labeled nucleotide that is not ATP or dATP.The use of labeled dATP can cause an increase in the background becauseany poly-A sequences in the target or probe will become heavily labeledand will hybridize to the strands containing poly-T stretchescomplementary to the poly-A tails present in all of the clones.Similarly, the use of dTTP would heavily label poly-T stretchescomplementary to the polyA tails in mRNA.

[0155] A fluorescent dye can also be attached to or incorporated in theprobe or target. If desired, a different fluor detectable at differentwavelengths can be incorporated into different targets and usedsimultaneously on the same probe. The use of different fluors isadvantageous since multiple targets can be bound to the same probe anddetected. A fluorescently labeled target can be detected using, forexample, a fluorescent scanner or confocal microscope. Measuring therelative abundance of two targets simultaneously on the same arrayrather than on two different arrays eliminates problems that arise dueto differences in the hybridization conditions or the quantity of targetPCR product on replicates of the same array. Nylon membranes aretypically unsuitable for most commercially available fluorescent tagsdue to background fluorescence from the membrane itself.

[0156] Infrared dyes are also useful as detectable moieties forattachment to a probe or target. Infrared dyes are particularly usefulwith targets or probes such as arrays attached to nylon membranes,provided the membrane is free of protein.

[0157] When determining the level of a nucleic acid molecule in atarget, some variation can occur, in particular for certainamplification products that are very sensitive to the amplificationconditions. To control for variation in amplification products betweennucleic acid targets, the target can be generated at two concentrationsof nucleic acid molecules, differing by a factor of two or more. The useof various nucleic acid concentrations to generate a target to confirmdifferential expression is described herein (see Examples II and III).

[0158] The methods of the invention are directed to detecting specificbinding of a target to a probe. When hybridizing a target to a probe,the specificity of binding is determined-by the stringency of thehybridization conditions. The length of oligonucleotide primers and thetemperature of the amplification reaction contributes to the finalproduct. The products are a function of both the starting abundance ofeach target nucleic acid molecule and the quality of the match betweenthe oligonucleotide primer and the amplified nucleic acid target. Forexample, oligonucleotide primers of about 8 bases in length at reactiontemperatures of about 60° C. can be used to generate a target.Hybridization conditions can range, for example, from about 32° C. inabout 2×SSC to about 68° in about 0.1×SSC. The hybridization temperaturecan be, for example, about 40° C., about 45° C., about 50° C., about 55°C., about 60° C. or about 65° C. Furthermore, the SSC concentration (seebelow) can be, for example, about 0.2×,0.3×, 0.5×, 1× or 1.5×.

[0159] The invention additionally provides a method for determining therelative amounts of nucleic acid molecules in two targets by comparingthe amount of specific binding of a probe to the target, wherein theamount of specific binding corresponds to an expression level of thenucleic acid molecules in the target, to an expression level of thenucleic acid molecules in a second target. For example, if desired, theexpression level in a first target, which can be a target for which thelevel of expression is unknown, can be compared to the expression levelin a second target. The expression level in the second target can bedetermined, for example, by binding the same probe to the second targetand determining the level of expression in the second target. Theexpression level in the first and second target can then be compared.

[0160] The relative expression level in a first target can also becompared to the expression level in a second target, where the abundancein the second target is already known. As used herein, the term “known”when used in reference to expression level of a nucleic acid moleculemeans that an abundance of a nucleic acid molecule has been previouslydetermined. It is understood that such a known abundance would apply toa particular set of conditions. It is also understood that, for thepurpose of comparing the abundance of a nucleic acid molecule in anunknown target to a known abundance, the same method of measuring theabundance between the targets is used.

[0161] The invention also provides a method of identifying two or moredifferentially expressed nucleic acid molecules associated with acondition. The method includes the step of measuring the level of two ormore nucleic acid molecules in a target, for example using anarbitrarily sampled target or a statistically sampled target, whereinthe amount of specific binding of the target to the probe corresponds toan abundance of the nucleic acid molecules in the target. The methodfurther includes the step of comparing the relative expression level ofthe nucleic acid molecules in the target to an expression level of thenucleic acid molecules in a second target, whereby a difference inexpression level between the targets indicates a condition.

[0162] As used herein, the term “differentially expressed” means thatthe abundance of a molecule is expressed at different levels between twotargets. Two targets can be from different cells or tissues, or thetarget can be from the same cell or tissue under different conditions.The condition can be, for example, associated with a disease state suchas cancer, autoimmune disease, infection with a pathogen, includingbacteria, virus, fungal, yeast, or single-celled and multi-celledparasites; associated with a treatment such as efficacy, resistance ortoxicity associated with a treatment; or associated with a stimulus suchas a chemical, for example, a drug or a natural product, for example, agrowth factor.

[0163] The methods of the invention are useful for determiningdifferential gene expression between two targets. The methods of theinvention can be applied to any system where differential geneexpression is thought to be of significance, including drug and hormoneresponses, normal development, abnormal development, inheritance of agenotype, disease states such as cancer or autoimmunge disease, aging,infectious disease, pathology, drug treatment, hormone activity, aging,cell cycle, homeostatic mechanisms, and others, including combinationsof the above conditions.

[0164] As disclosed herein, the abundance of nucleic acid molecules intwo targets can be compared to identify two or more differentiallyexpressed nucleic acid molecules (see Examples I to III). Usingarbitrarily sampled targets, targets treated with and without EGF werehybridized with probes and a number of genes regulated by EGF wereidentified. EGF-regulated genes were found that increased in response toEGF and decreased in response to EGF (see Tables 1 and 2 in Examples IIand III, respectively). The methods of the invention can therefore beused to determine nucleic acid molecules that increase in response to astimulus or decrease in response to a stimulus (see Example II).

[0165] The arbitrarily sampled targets and statistically sampled targetsused in the invention can readily detect less abundant nucleic acidmolecules in a population. Therefore, the methods of the invention areparticularly useful for identifying differentially expressed nucleicacid molecules since differentially expressed nucleic acid molecules areoften less abundant.

[0166] The methods of the invention can be applied to any two targets todetermine differential gene expression. The methods of the invention canbe used, for example, to diagnose a disease state. In such a case, a“normal” target is compared to a potential disease target to determinedifferential gene expression associated with the disease. A normaltarget can be a target sample of the same tissue nearby the diseasedtissue from the patient. A normal target can also be a sample of thesame tissue from a different individual. Using methods of the invention,a profile of normal expression can be established by determining a geneexpression pattern in one to many normal target samples, which can thenbe used to compare to a potentially diseased target sample. Differentialgene expression between the normal and diseased tissue can be used todiagnose or confirm a particular disease state. Furthermore, acollection of target samples obtained from known diseased tissue cansimilarly be determined to identify an abundance profile of the targetreflecting gene expression associated with that disease. In such a case,comparison of a potential disease target sample to a known diseasetarget sample with no differential gene expression would indicate thatthe potential disease target sample was associated with the disease.

[0167] The methods of the invention can also be used to assess treatmentof an individual with a drug. The analysis of gene expression patternsassociated with a particular drug treatment is also known aspharmacogenomics. The methods of the invention can be used to determineefficacy of a treatment, resistance to a treatment or toxicityassociated with a treatment. For example, a gene expression profile canbe determined on an individual prior to treatment and after treatmentfor a particular disease or condition. A difference in gene expressioncan then be correlated with the effectiveness of the treatment. Forexample, if an individual is found to be responsive to treatment and ifthat treatment is associated with differential gene expression, theidentification of differential gene expression can be used to correlatewith efficacy of that treatment. As described above, a gene expressionpattern associated with an untreated individual can be determined in theindividual prior to treatment or can be determined in a number ofindividuals who have not been given the treatment. Similarly, a changein expression pattern associated with efficacy of the treatment can bedetermined in a number of individuals for which the treatment wasefficacious. In such a case, comparison of a treated target sample to aknown target sample associated with efficacious treatment with nodifferential gene expression would indicate that the treatment waslikely to be efficacious. A similar approach can be used to determinethe association of a treatment with toxicity of the treatment orresistance to a treatment. Resistance to a treatment could be associatedwith a change in expression pattern from an untreated target sample orcould be associated with no change in the expression pattern compared toan untreated target sample.

[0168] The methods of the invention can also be used to determineco-regulated genes that can be potential targets for drug discovery. Forexample, a cell or organism can be treated with a stimulus anddifferential gene expression between the untreated target sample and thetarget sample treated with a stimulus can be determined. The stimuluscan be, for example, a drug or growth factor. A difference in theabundance of nucleic acid molecules between an untreated target sampleand a target sample treated with a stimulus can be used to identifydifferential gene expression associated with the stimulus. Such adifferential expression pattern can be used to determine if a targetsample has been exposed to a stimulus. Additionally, the gene expressionprofile can be used to identify other chemicals that mimic the stimulusby screening for compounds that elicit the same gene expression profileas the original stimulus. Thus, the methods of the invention can be usedto identify new drugs that have a similar effect as a known drug.

[0169] The methods of the invention are useful for identifying a markerfor a pathway that correlates with a drug response by determining anabundance profile for a given target sample that reflects the expressionprofile of the source population of nucleic acids such as the sourceRNA. For example, the methods of the invention can be used to define the“neighborhood” of potential therapeutic targets by identifying severalgenes regulated in response to a drug, thereby providing “neighbors” ina pathway that are potential drug targets. The invention can also beused to define bad neighborhoods, for example, pathways that “failed”therapeutics, which can indicate that a particular pathway should not beperturbed. Additional insights into the function of a pathway can beobtained by sequencing any differentially expressed genes for whichcomplete sequence information is unavailable. The methods areparticularly useful for drug comparison. Correlation of gene expressionpatterns with a drug response can be used to determine why two similardrugs have a somewhat different spectrum of effects.

[0170] With knowledge of the correlation between gene expression andresponse to a drug, drugs can be tested in cell types that are of morerelevance to a particular disease or condition. By knowing the pathwaysthat are present in a cell type associated with a pathology, predictionscan be made regarding the drug responses of the cell type and therebyallow choice of drugs from a tested panels of drugs that are most likelyto affect the pathology. The correlation of information on drug responseand gene expression also can aid in choosing drugs that would besynergistic, for example, drugs that hit non-overlapping pathways, or,for example, drugs that affect overlapping pathways when genes in theoverlap are targeted.

[0171] The methods of the invention can be applied to determining theresponse to a stimulus, in particular to determining a response to astimulus for drug discovery. One potential application is to use themethods of the invention on the 60 cell lines in the National CancerInstitute (NCI) drug screening panel. These 60 cell lines are maintainedby the NCI and used to assess drug activity.

[0172] For example, each of the 60 cell lines of the NCI panel can beused as a complex measuring device that reports the single variable ofcell growth and, secondarily, apoptosis. Changes in each cell type'sgrowth upon treatment with a chemical such as a drug is determined.Studies of tens of thousands of drugs, when compared over all 60 celllines, have shown that similar effects on growth have proven to sharemechanisms of action. Comparing the response of the 60 cell lines tovarious drugs allows grouping of drugs according to their detailedchemical functionality. Consequently, the panel of cell lines has becomeone of the most important analytical tools for drug discovery.

[0173] The methods of the invention can be applied to analyzing drugresponse in the 60 cell lines of the NCI panel. As disclosed herein, themethods are applicable to determining differential gene expression,which can be correlated with the response of the cells to a particulardrug. The methods can be used to identify many differentially expressedgenes associated with a drug response. Therefore, an analysis of geneexpression in untreated cells in the 60 cell line NCI drug screeningpanel can be used to determine a profile of gene expression, based onthe presence or absence of mRNAs, that correlate with some of the many10,000's of drugs that have been used on the panel.

[0174] Differential gene expression patterns are expected to correlatewith drug response. Following identification of such a correlation in 30of the cell lines, prediction of drug responses in the remaining 30 celllines can be tested. This strategy circumvents the need to determineextensive expression profiles for all 60 cell lines for every new drugto find genes that correlate with the ability to respond to the drug.This strategy differs from previous methods in that differentialexpression of the gene after treatment does not need to occur. All thatis necessary is that the gene be differentially regulated between celltypes prior to treatment.

[0175] Each of the 60 cell lines has its characteristic response todrugs, and these responses depend on the cell's phenotype. The responseof any cell to any drug depends on which genetic systems are operativein that cell. Once treated, the cell's genetic mechanisms are perturbed,leading to differential gene expression, differential proteinmodification, and a wide variety of other changes that can be subtle.Nonetheless, it is the ground state genetic pattern or profile of geneexpression, before any exposure to drug, that determines how the cellresponds to drugs.

[0176] The ground state of genetic profile is an important state tocharacterize for cells, for example, cells of the NCI panel. The groundstate of the cell has predictive power for how a given cell will respondto a given drug. Furthermore, the ground state is the only unifyingpoint of reference for the behavior of almost 100,000 different drugsand can be used to determine response to additional drugs.

[0177] For example, if two steroids and two alkylating agents areapplied to the panel of 60 cell lines, and their growth spectra arecompared, the average responses of the cell lines to the steroids tendsto be similar, the average responses to the alkylating agents tend to besimilar, but a comparison of responses to steroids versus alkylatingagents show fewer similarities. This reflects the fact that steroidselicit their effects through naturally existing receptors, whereasalkylating agents elicit their effects by causing widespread damage. Thesignal transduction pathways for handling steroidal signals versushandling damage are largely different.

[0178] When a panel of steroids are used to challenge the 60 cell lines,some of the cells are growth accelerated, some growth inhibited, andsome are indifferent to steroids. Much of this data is available on theNCI web site (http://www.nci.nih.gov/). An obvious next step is toexamine gene responses to the steroids to see which genes are activated,which are inactivated, and which are indifferent. Each cell type's geneswill respond differently, depending on which of about 30 steroidreceptor genes are expressed in the cell type before steroid treatment.

[0179] The various responses of genes to steroids are celltype-dependent, in large part due to which receptors are present. Bycomparing the ground state gene expression of the NCI panel of cells,the spectrum of steroid receptor genes expressed in each cell type canbe described, thereby explaining what is needed, in genetic terms, for acell to be responsive to any particular steroid.

[0180] The drug-receptor, or hormone-receptor, relationship describedabove is one example of a correlation that can be drawn between the NCIpanel baseline gene expression database and the NCI panel drug responsedatabase. Other drug responses can be readily determined. For example,drugs that induce apoptosis also induce gene expression, and differentapoptotic responses correlating with cell type can be used to determinegene products that control apoptosis.

[0181] It is understood that methods of the invention can be applied toany cell type, in addition to the NCI panel of cells, forcharacterization of a response to a drug or other stimulus. Thefunctional overlap between drugs is an important concern in drugdiscovery. A study of the responses of genes to drugs in different celltypes is useful because gene expression determines the response of thecell to the drug. The methods of the invention can therefore be appliedto determine the response of one or more cell lines to a particulardrug.

[0182] The methods can also be applied to characterize the ground stateof the NCI panel of cells. The methods described herein can be used tocorrelate the response of tens of thousands of drugs with genes in thepathways regulated by the drug. The methods of the invention can beapplied to determine an expression profile for the >80,000 drugspreviously tested with the NCI panel of cells. The methods areapplicable to determining coordinate mechanisms of drug action, likelypathways controlling drug activity, pathways that correlate withtoxicity, apoptosis and other effects of drugs.

[0183] The invention also provides methods for the use of the patternsof gene expression by a panel of different untreated cells or tissues tocorrelate basal gene expression with susceptibility to a treatment, suchas differences in the growth of cells, for example, the NCI panel ofcells, in the presence of a drug, pathogen or other stimulus. Themethods can be applied to determine genes and pathways that are presentprior to treatment and also to correlate treatment with the phenotypeinduced by the treatment.

[0184] To obtain additional information on gene expression, theexpression pattern of two different RNA populations from differentconditions can be determined (McClelland et al., Nucleic Acids Res.22:4419-4431 (1994); McClelland et al., Trends Genet. 11:242-246(1995)). For example, if interested in apoptosis, using a target from acell that has been stressed but which has not undergone apoptosis can beused to determine genes responsive to apoptosis, genes responsive tostress, and genes that respond to both. The identification ofdifferentially regulated genes can be used to further characterizetranscriptional activity of genes under various conditions. The genescan be further characterized to correlate promoters of regulated geneswith signal transduction pathways that respond to a given condition.

[0185] When determining differential expression of a nucleic acidmolecule, the determination that an RNA sampled in a target isdifferentially regulated is initially made by comparing differentialabundance at two different concentrations of nucleic acid in the targetsample. Abundance is determined for the nucleic acid molecules of thetarget sample for which no difference in abundance is observed at twodifferent concentrations of RNA source. Only those hybridization eventsthat indicate differential expression at both RNA concentrations in bothRNA sources are used (see Examples II and III).

[0186] For hybridization to an array to determine differentialexpression, four membranes were used for radioactively labeled target,one for each of two concentrations of RNA for each of the two RNAsamples compared (see Examples I to III). If two color fluorescence isused for detecting the target, then two membranes are used, one for eachof the two concentrations of starting target sample nucleic acids,because the two targets with different detectable fluorescent markerscan be mixed and applied to the same probe. If a subsequent verificationstep is employed, for example, RT-PCR, one marker can be used for eachtarget sample.

[0187] Confirmation of differential expression does not need a fulllength sequence and can be confirmed using RT-PCR of the known region.In particular, low stringency PCR can be used to generate products a fewhundred bases in length (Mathieu-Daude et al., Mol. Biochem. Parasitol.92:15-28 (1998)). This method generates internal “control” PCR productsthat can be used to confirm the quality of the PCR reaction and thequality and quantity of the RNA used.

[0188] The invention additionally provides a profile of five or morestimulus-regulated nucleic acid molecules. As used herein, the term“profile” refers to a group of two or more nucleic acid molecules thatare characteristic of a target under a given set of conditions. Theinvention provides a profile comprising a portion of a nucleotidesequence selected from the group consisting of the nucleotide sequencesreferenced as SEQ ID NOS:1-45. The profile includes a portion of anucleotide sequence of the GenBank accession numbers H11520, H11161,H11073, U35048, R48633, H28735, AF019386, H25513, H25514, M13918,H12999, H05639, L49207, H15184, H15124, X79781, H25195, H24377, M31627,H23972, H27350, AB000712, R75916, X85992, R73021, R73022, U66894,H10098, H10045, AF067817, R72714, X52541, H14529, M10277, H27389,D89092, D89678, H05545, J03804, H27969, R73247, U51336, H21777, K00558,and D31765. The profile of the invention includes a portion of thenucleotide sequences encoding TSC-22, fibronectin receptor α-subunit,ray gene, X-box binding protein-1, CPE receptor, epithelium-restrictedets protein ESX and Vav-3.

[0189] The invention also provides a target comprising a portion of eachof the nucleotide sequences referenced as SEQ ID NOS:1-45. The targetincludes a portion of a nucleotide sequence of the GenBank accessionnumbers H11520, H11161H11073, U35048, R48633, H28735, AF019386, H25513,H25514, M13918, H12999, H05639, L49207, H15184, H15124, X79781, H25195,H24377, M31627, H23972, H27350, AB000712, R75916, X85992, R73021,R73022, U66894, H10098, H10045, AF067817, R72714, X52541, H14529,M10277, H27389, D89092, D89678, H05545, J03804, H27969, R73247, U51336,H21777, K00558, and D31765. The invention also provides a probecomprising a portion of a nucleic acid sequence selected from the groupconsisting of SEQ ID NOS:1-45.

[0190] The invention further provides a substantially pure nucleic acidmolecule comprising a nucleic acid sequence selected from the groupconsisting of SEQ ID NOS:1-45, or a functional fragment thereof, so longas the nucleic acid molecule does not include the exact SEQ ID NOS:1-45.

[0191] The invention additionally provides a method of measuring theamount of two or more nucleic acid molecules in a first target relativeto a second target. The method includes the step of hybridizing a firstamplified nucleic acid target comprising two or more nucleic acidmolecules to a probe, wherein the target is amplified from a populationof nucleic acid molecules using one or more oligonucleotides, whereinthe oligonucleotide hybridizes by chance to a nucleic acid molecule inthe population of nucleic acid molecules, wherein the amplification isnot based on abundance of nucleic acids in the population of nucleicacid molecules, and wherein the amplified nucleic acids in the targetare enhanced for less abundant nucleic acids in the population ofnucleic acid molecules. Further included in the method are the steps ofdetecting the amount of hybridization of the first amplified nucleicacid target to the probe, wherein the amount of hybridizationcorresponds to an abundance of the nucleic acid molecules in the firsttarget; and comparing the abundance of the nucleic acid molecules in thefirst target to the abundance of the nucleic acid molecules in a secondtarget, wherein the amplified nucleic acid target comprises a subset ofnucleic acids in the initial nucleic acid populations.

[0192] The invention further provides a method of measuring the amountof two or more nucleic acid molecules in a first target relative to asecond target. The method includes the step of hybridizing a firstamplified nucleic acid target comprising 50 or more nucleic acidmolecules to a probe, wherein the target is amplified from a populationof nucleic acid molecules, wherein the amplification is not based onabundance of nucleic acids in the population of nucleic acid molecules,and wherein the amplified nucleic acids in the target are enhanced forless abundant nucleic acids in the population of nucleic acid molecules.The method further includes the steps of detecting the amount ofhybridization of the amplified nucleic acid target to the probe, whereinthe amount of hybridization corresponds to an expression level of thenucleic acid molecules in the first target; and comparing the abundanceof the nucleic acid molecules in the first target to an abundance of thenucleic acid molecules in a second target, wherein the amplified nucleicacid target comprises a subset of nucleic acids in each nucleic acidpopulation such as an RNA population.

[0193] As used herein, the term “hybridizes by chance,” when referringto an oligonucleotide, means that hybridization of the oligonucleotideto a complementary sequence is based on the statistical frequency of thecomplementary sequence occurring in a given nucleic acid molecule. Anoligonucleotide that hybridizes by chance is generated by determiningthe sequence of the oligonucleotide and subsequently determining if theoligonucleotide will hybridize to one or more nucleic acid molecules.The hybridization of such an oligonucleotide is not predetermined by thesequence of a known nucleic acid molecule and therefore occurs bychance. As such, an arbitrary oligonucleotide is considered to hybridizeby chance since the oligonucleotides are determined without reference tothe exact sequence to be amplified. In contrast, an oligonucleotide thatdoes not hybridize by chance is one that is generated by first analyzinga known sequence and then identifying an exact sequence in the nucleicacid molecule that can be used as an oligonucleotide that will amplifyan exact sequence between the oligonucleotides. The hybridization ofsuch an oligonucleotide has been predetermined by the sequence of aknown nucleic acid molecule and, therefore, does not occur by chance.

[0194] As used herein, the phrase “amplification is not based onabundance” means a target comprises nucleic acid molecules which arerepresentative of the nucleic acid molecules in a population of nucleicacid molecules without regard to the relative amount of individualnucleic acid molecules in the population.

[0195] As used herein, the phrase “enhanced for less abundant nucleicacids” means that individual nucleic acid molecules that are lessabundant in the population of nucleic acid molecules are amplified sothat the amount of these less abundant nucleic acid molecules would beincreased relative to the amount of these nucleic acid molecules in theoriginal population of nucleic acid molecules. Thus, the relativeproportion of nucleic acid molecules in the population of nucleic acidmolecules would not be maintained in the target.

[0196] As used herein, the term “single sample” when used in referenceto a target means that the target is generated using nucleic acidmolecules from a single cell, tissue or organism sample that has notbeen previously exposed to another sample. For example, if a target wasgenerated from a population of nucleic acid molecules that wasdetermined by the exposure of one sample to another, for example, thesubtraction of the nucleic acid molecules of one sample from another,such a target would not be considered as coming from a single sample.

[0197] The following examples are intended to illustrate but not limitthe present invention.

EXAMPLE I Generation and Use of Arbitrarily Sampled Targets to Probe aDNA Array

[0198] This example describes the generation of an arbitrarily sampledtarget having reduced complexity to probe a DNA array to determine mRNAexpression.

[0199] A DNA fingerprint was generated using RAP-PCR and was convertedto high specific activity probe using random hexamer oligonucleotides(Genosys Biotechnologies; The Woodlands Tex.). Up to 10 μg of PCRproduct from RAP-PCR was purified using a QIAQUICK PCR Purification Kit(Qiagen, Inc.; Chatsworth Calif.), which removes unincorporated bases,primers, and primer dimers smaller than 40 base pairs. The DNA wasrecovered in 100 μl of 10 mM Tris, pH 8.3. Random primed synthesis withincorporation of radioactive phosphorus from (α-³²P)dCTP was used understandard conditions. 10% of the recovered fingerprint DNA (10 μl) wascombined with 6 μg random hexamer oligonucleotide primer, and 1 μg ofone of the fingerprint primers (Genosys) in a total volume of 28 μl,boiled for 3 min, then placed on ice. The hexamer/primer/DNA mix wasmixed with 22 μl reaction mix to yield a 50 μl reaction containing a0.05 mM concentration of three dNTP (dATP, dTTP and dGTP; minus dCTP),100 μCi of 3000 Ci/mmol (α-³²P) dCTP (10 μl), 1× Klenow fragment buffer(50 mM Tris-HCl, pH 8.0, 10 mM MgCl₂, 50 mM NaCl) and 8 U Klenowfragment (3.82 U/μl; Gibco-BRL Life Technologies; Gaithersburg Md.). Thereaction was performed at room temperature for 4 hr. For maximum targetlength, the reaction was chased by adding 1 μl of 2.5 mM dCTP andincubated for 15 min at room temperature followed by an additional 15min incubation at 37° C. The unincorporated nucleotides and hexamerswere removed with the Qiagen Nucleotide Removal Kit (Qiagen) and thepurified products were eluted twice in 140 μl 10 mM Tris, pH 8.3.

[0200] For hybridization to the array, four membranes were used forradioactively labeled target, one for each of two concentrations of RNAfor each of the two RNA samples to be compared. To prepare the cDNAfilters (Genome Systems), the filters were prewashed in three changes of2×SSC and 0.1% sodium dodecyl sulfate (SDS) in a horizontally shakingflat bottom container to reduce the residual bacterial debris. 20×SSCcontains 3 M NaCl, 0.3 M Na₃citrate-2H₂O, pH 7.0. The first wash wascarried out in 500 ml for 10 min at room temperature. The second andthird washes were carried out in 1 liter of prewarmed (50° C.) prewashsolution for 10 min each.

[0201] For prehybridization, the filters were transferred to rollerbottles and prehybridized in 60 ml prewarmed (42° C.) prehybridizationsolution containing 6×SSC, 5× Denhardt's reagent, 0.5% SDS, 100 pg/mlfragmented, denatured salmon sperm DNA (Pharmacia; Piscataway N.J.) and50% formamide (Aldrich; Milwaukee Wis.) for 1-2 hr at 42° C. 50×Denhardt's solution contains 1% Ficoll, 1% polyvinylpyrrolidone and 1%bovine serum albumin, sterile filtered.

[0202] For hybridization, the prehybridization solution was removed and7 ml prewarmed (42° C.) hybridization solution, containing 6×SSC, 0.5%SDS, 100 μg/ml fragmented, denatured salmon sperm DNA and 50% formamide,was added. To decrease the background hybridization due to repeatedsequences such as Alu repeats, long interspersed repetitive elements(LINE) or centromeric DNA repeats, sheared human genomic DNA (1 μg/mlstock concentration) was denatured in a boiling water bath for 10 minand immediately added to the hybridization solution to a finalconcentration of 10 μg/ml. Simultaneously, the labeled target (280 μl)was denatured in a boiling water bath for 4 min and immediately added tothe hybridization solution. Hybridization was carried out at 42° C. for2 to 48 hrs, typically 18 hr, in a hybridization oven using rollerbottles or sealed in a plastic bag and incubated in a water bath.

[0203] For the washes, the temperature was set to 55° C. in theincubator oven (Techne HB-1D; VWR Scientific; San Francisco Calif.). Thehybridization solution was poured off and the membrane was washed twicewith 50 ml 2×SSC and 0.1% SDS for 5 min at room temperature. Themembrane was then washed with 100 ml 0.1×SSC and 0.1% SDS and incubatedfor 10 min at room temperature. For the further washes, the washsolution, containing 0.1×SSC and 0.1% SDS, was prewarmed to 50° C. andthe filter was washed for 40 min in a roller bottle with 100 ml washsolution. The filter was then transferred to a horizontally shaking flatbottom container and washed in 1 liter of the wash solution for 20 minunder gentle agitation. The filter was transferred back to a rollerbottle containing 100 ml prewarmed 0.1×SSC and 0.1% SDS and incubatedfor 1 hr. The final wash solution was removed and the filter brieflyrinsed in 2×SSC at room temperature.

[0204] After washing, the membranes were lightly dried with 3MM paperand the slightly moist membranes were wrapped in SARAN wrap. Themembranes were exposed to X-ray film.

[0205]FIG. 1 shows differential hybridization to clone arrays. All fourimages show a closeup of an autoradiogram for the same part of a largermembrane. Each image spans about 4000 double spotted E. coli colonies,each carrying a different EST clone. Panel A shows hybridization of 1 μgof polyA⁺ RNA from confluent human keratinocytes that was radiolabeledduring reverse transcription. About 500 clearly hybridizing clones canbe seen. Panels B and C show RAP-PCR fingerprints with a pair ofarbitrary primers that was performed on cDNA from oligo(dT) primed cDNAof confluent human keratinocytes that were untreated (Panel B) ortreated with EGF (Panel C). The pattern of hybridizing genes was almostidentical in Panels B and C, but entirely different from that seen withtotal polyA+ RNA (compare to Panel A). The two radiolabeled coloniesfrom one differentially expressed cDNA are indicated with an arrow.Differential expression of this gene was subsequently confirmed byspecific RT-PCR (Trenkle et al., Nucl. Acids Res. 26:3883-3891 (1998)).

[0206]FIG. 1D shows a RAP-PCR fingerprint with a different pair ofarbitrary primers that was performed on RNA from confluent humankeratinocytes. This pattern of hybridization is almost entirelydifferent from that found with the previous primer pair (Panel B) andwith mRNA (Panel A), with very few overlapping spots between Panel D andPanels A and B.

[0207] These results demonstrate that arbitrarily sampled targets, whichhave reduced complexity, allow detection of mRNAs that are notdetectable using total message as a target. Thus, unlike a total messagetarget, which detects mRNAs based on their abundance, an arbitrarilysampled target can be used to detect less abundant mRNAs.

EXAMPLE II An Arbitrarily Sampled Target Generated by RT-PCR DetectsGenes Differentially Expressed in Response to EGF

[0208] This example describes the use of RT-PCR with arbitrary primersto generate an arbitrarily sampled target for detecting differentialgene expression upon treatment of cells with EGF.

[0209] An arbitrarily sampled target generated by RT-PCR was used toprobe arrays for differential gene expression (Trenkle et al., NucleicAcids Res. 26:3883-3891 (1998)). For RNA preparation, the immortal humankeratinocyte cell line HaCaT (Boukamp et al., Genes Chromosomes Cancer19:201-214 (1997)) was grown to confluence and maintained at confluencefor two days. The media, DMEM containing 10% fetal bovine serum (FBS)and penicillin/streptomycin was changed one day prior to experiments.EGF (Gibco-BRL) was added at 20 ng/ml, or TGF-β (R&D Systems;Minneapolis Minn.) was added at 5 ng/ml. Treated and untreated cellswere harvested after four hours by scraping the petri dishes in thepresence of lysis buffer (RLT buffer; Qiagen) and homogenized throughQiashredder columns (Qiagen). On average, 7×10⁶ cells, grown toconfluency in a 100 mm diameter petri dish, yielded 40 μg of total RNAfrom the RNEASY total RNA purification kit (Qiagen). RNA, in 20 mM Tris,10 mM MgCl₂ buffer, pH 8 was incubated with 0.08 U/μl of RNase freeDNase and 0.32 U/μl of RNase inhibitor (both from Boehringer MannheimBiochemicals; Indianapolis Ind.) for 40 min at 37° C. and cleaned againusing the RNEASY kit, which is important for removing small amounts ofgenomic DNA that can contribute to the fingerprints. RNA quantity wasmeasured by spectrophotometry, and RNA samples were adjusted to 400ng/μl in water. RNA samples were checked for quality and concentrationby agarose gel electrophoresis and stored at −20° C.

[0210] For RNA fingerprinting, RAP-PCR was performed using standardprotocols (McClelland et al., supra, 1994; Reverse transcription wasperformed on total RNA using four concentrations per sample (1000, 500,250 and 125 ng per reaction) and a oligo d(T) primer (15-mer) (Genosys).RNA (5 μl) was mixed with 5 μl of buffer for a 10 μl final reactionvolume containing 50 mM Tris, pH 8.3, 75 mM KCl, 3 mM MgCl₂, 20 mMdithiothreitol (DTT), 0.2 mM of each dNTP, 0.5 μM of primer, and 20 U ofMuLV-reverse transcriptase (Promega; Madison Wis.). RNA samples arechecked for DNA contaminants by including a reverse transcriptase-freecontrol in initial RAP-PCR experiments. The reaction was performed at37° C. for 1 hr, after a 5 min ramp from 25° C. to 37° C. The enzyme wasinactivated by heating the samples at 94° C. for 5 min, and the newlysynthesized cDNA was diluted 4-fold in water.

[0211] PCR was performed after the addition of a pair of two different10- or 11-mer oligonucleotide primers of arbitrary sequence; pair A:GP14 (GTAGCCCAGC; SEQ ID NO:) plus GP16 (GCCACCCAGA; SEQ ID NO:), pairB: Nucl+ (ACGAAGAAGAAGAG; SEQ ID NO:) plus OPN24 (AGGGGCACCA; SEQ IDNO:). In general, there are no particular constraints on the primersexcept that they contain at least-a few C or G bases, that the 3′ endsare not complementary with themselves or the other primer in thereaction, to avoid primer dimers, and that primer sets are chosen thatare different in sequence so that the same parts of mRNA are notamplified in different fingerprints.

[0212] Diluted cDNAs (10 μl) were mixed with the same volume of 2×PCRmixture containing 20 mM Tris, pH 8.3, 20 mM KCl, 6.25 mM MgCl₂, 0.35 mMof each dNTP, 2 μM of each oligonucleotide primer, 2 μCi α-(³²P)-dCTP(ICN; Irvine Calif.) and 5 U AMPLITAQ DNA polymerase Stoffel fragment,(Perkin-Elmer-Cetus; Norwalk Conn.) for a 20 μl final reaction volume.Thermocycling was performed using 35 cycles of 94° C. for 1 min, 35° C.for 1 min and 72° C. for 2 min.

[0213] A 3.5 μl aliquot of the amplification products was mixed with 9μl of formamide dye solution, denatured at 85° C. for 4 min, and chilledon ice. 2.4 pl was loaded onto a 5% polyacrylamide, 43% urea gelprepared with 1×TBE buffer containing 0.09 M Tris-borate, 0.002 Methylene diamine tetraacetic acid (EDTA). The PCR products resultingfrom the four different concentrations of the same RNA template wereloaded side by side on the gel.

[0214] Electrophoresis was performed at 1,700 V or at a constant powerof 50-70 Watts until the xylene cyanol tracking dye reached the bottomof the gel (approximately 4 h). The gel was dried under vacuum andplaced on Kodak BioMax X-Ray film for 16 to 48 hours.

[0215] For labeling of RAP-PCR products for use as targets to probearrays, up to 10 pg of PCR product from RAP-PCR was purified using aQIAQUICK PCR Purification Kit (QIAGEN) which removes unincorporatedbases, primers, and primer dimers under 40 base pairs. The DNA wasrecovered in 50 μl of 10 mM Tris, pH 8.3.

[0216] Random primed synthesis with incorporation of α-(³²P)-dCTP wasperformed essentially as described in Example I. Briefly, 10% of therecovered fingerprint DNA, typically about 100 ng in 5 μl, was combinedwith 3 μg random hexamer oligonucleotide primer and 0.3 μg of each ofthe fingerprint primers in a total volume of 14 μl, which was boiled for3 min and then placed on ice.

[0217] The hexamer/primer/DNA mix was mixed with 11 μl reaction mix toyield a 25 μl reaction containing 0.05 mM of three dNTP (minus dCTP), 50μCi of 3000 Ci/mmol α-(³²P)-dCTP (5 μl), 1× Klenow fragment buffer,containing 50 mM Tris-HCl, 10 mM MgCl₂, 50 mM NaCl, pH 8.0, and 4 UKlenow fragment (Gibco-BRL). The reaction was performed at roomtemperature for 4 hrs. For maximum target length, the reaction waschased by adding 1 pl of 1.25 mM dCTP and incubated for 15 min at 25°C., followed by an additional 15 min incubation at 37° C. Theunincorporated nucleotides, hexamers and primers were removed with theQiagen Nucleotide Removal Kit (Qiagen) and the purified products wereeluted using two aliquots of 140 μl of 10 mM Tris, pH 8.3.

[0218] For labeling of poly(A)⁺ mRNA and genomic DNA for use as atarget, random hexamers were used to label poly(A)⁺-selected mRNA andgenomic DNA. Genomic DNA (150 ng) was labeled using the same protocolused for labeling the RAP-PCR products described above. Poly(A)⁺ mRNA (1μg) and 9 μg random hexamer in a volume of 27 μl were incubated at 70°C. for 2 min and chilled on ice. The RNA/hexamer mix was mixed with 23μl master mix, which contained 10 μl 5×AMV reaction buffer, containing250 mM Tris-HCl, pH 8.5, 40 mM MgCl₂, 150 mM KCl, 5 mM DTT, 1 μl threedNTP, each 33 mM (dATP, dTTP, dGTP; minus dCTP), 2 μl AMV reversetranscriptase (20 units; Boehringer Mannheim) and 10 μl 3000 Ci/mmolα-(³²P)-dCTP in a final volume of 50 μl. The reaction was incubated atroom temperature for 15 min, ramped for 1 hour to 47° C., held at 47° C.for 1 hr, and chased with 1 μl of 33 mM dCTP for another 30 min at 47°C. The labeled products were purified as described above.

[0219] For hybridization to the array, four membranes were used, onemembrane for each of two concentrations of RNA for each of the two RNAsamples to be compared. The cDNA filters (Genome Systems) were washed inthree changes of 2×SSC and 0.1% SDS in a horizontally shaking flatbottom container to reduce the residual bacterial debris. The first washwas carried out in 500 ml for 10 min at room temperature. The second andthird washes were carried out in 1 liter of prewash solution, prewarmedto 55° C., for 10 min each wash.

[0220] For prehybridization, the filters were transferred to rollerbottles and prehybridized in 60 ml prehybridization solution, prewarmedto 42° C., containing 6×SSC, 5× Denhardt's reagent, 0.5% SDS, 100 pg/mlfragmented, denatured salmon sperm DNA, and 50% formamide for 1-2 hrs at42° C. in a hybridization oven.

[0221] For hybridization, the prehybridization solution was removed and7 ml hybridization solution, prewarmed to 42° C., containing 6×SSC, 0.5%SDS, 100 μg/ml fragmented, denatured salmon sperm DNA, and 50%formamide, was added. To decrease the background hybridization due torepeats such as Alu and Line elements, sheared human genomic DNA wasdenatured in a boiling water bath for 10 min and immediately added tothe hybridization solution to a final concentration of 10 μg/ml. 10ng/ml poly(dA) was added to block oligo d(T) stretches in theradiolabeled target. Simultaneously, the labeled target, in a totalvolume of 280 μl, was denatured in a boiling water bath for 4 min andimmediately added to the hybridization solution. The hybridization wascarried out at 42° C. for 2-48 hrs, typically 18 hrs, in large rollerbottles.

[0222] For the washes, the incubator oven temperature was set to 68° C.The hybridization solution was poured off and the membrane was washedtwice with 50 ml 2×SSC and 0.1% SDS at room temperature for 5 min. Thewash solution was then replaced with 100 ml 0.1×SSC and 0.1% SDS andincubated for 10 min at room temperature. For the further washes, thewash solution, containing 0.1×SSC and 0.1% SDS, was prewarmed to 68° C.The membranes were incubated 40 min in 100 ml of wash solution in theroller bottles, then the filters were transferred to horizontallyshaking flat bottom containers and washed in 1 liter for 20 min undergentle agitation. The filters were transferred back to the rollerbottles containing 100 ml 0.1×SSC and 0.1% SDS, prewarmed to 68° C., andincubated for 1 hr. The final wash solution was removed and the filtersare briefly rinsed in 2×SSC at room temperature.

[0223] After washing, the membranes were blotted with 3MM paper, wrappedin SARAN wrap while moist, and exposed to X-ray film. The membranes wereusually sufficiently radioactive that a one-day exposure with a screenrevealed the top 1000 products on an array of 18,432 bacterial coloniescarrying EST clones. Weaker targets or fainter hybridization events werevisualized using an intensifying screen at −70° C. for a few days.

[0224] For confirmation of differential expression, low stringencyRT-PCR was used. The initial confirmation of differential expression wasthe use of two RNA concentrations per sample. Only those hybridizationevents that indicated differential expression at both RNA concentrationsin both RNA samples were relied upon.

[0225] More than 70% of the I.M.A.G.E. consortium clones have singlepass sequence reads from the 5′ or 3′ end, or both, deposited in theGenBank database. In cases where there is no prior sequence informationavailable, the clones can be ordered from Genome Systems and sequenced.Sequences were used to derive PCR primers of 18 to 25 bases in lengthusing MacVector 6.0 (Oxford Molecular Group; Oxford UK). Generally,primers were chosen to generate PCR products of 50 to 250 base pairs andhave melting temperatures of at least 60° C.

[0226] Reverse transcription was performed under the same conditions asin the RAP-PCR protocol described above, using an oligo-d(T) primer or amixture of random 9-mer primers (Genosys). The PCR reaction wasperformed using the two pairs of specific primers described below (18 to25-mers). The PCR conditions were the same as in the RAP-PCR fingerprintprotocol except that 1.5 μM of each primer was used. A low stringencythermal profile was used: 94° C. for 40 sec, 47° C. for 40 sec, and 72°C. for 1 min, for 19, 22 and 25 cycles in three separate reaction tubes.The reactions were carried out in three sets of tubes at different cyclenumbers because the abundance of the transcripts, the performance of theprimer pairs, and the amplifiability of the PCR products can vary. PCRproducts were run under the same conditions as above on a 5%polyacrylamide and 43% urea gel. The gel was dried and exposed to X-rayfilm for 18 to 72 hours. Invariance among the other arbitrary productsin the fingerprint was used as an internal control to indicate thereliability of the relative quantitation.

[0227] Primer pairs (Genosys) were used for confirmation of differentialexpression. For GenBank accession number H11520 (90 nucleotide product);primer A, AATGAGGGGGACAAATGGGAAGC (SEQ ID NO:); primer B,GGAGAGCCCTTCCTCAGACATGAAG (SEQ ID NO:). For TSC-22 gene (GenBankaccession numbers U35048, H11073, H11161; 179 nucleotide product);primer A, TGACAAAATGGTGACAGGTAGCTGG (SEQ ID NO:); primer B,AAGTCCACACCTCCTCAGACAGCC (SEQ ID NO).

[0228] For GenBank accession number R48633 (178 nucleotide product);primer A, CCCAGACACCCAAACAGCCGTG (SEQ ID NO); primer B,TGGAGCAGCCGTGTGTGCTG (SEQ ID NO:).

[0229] The array analyzed contains 18,432 E. coli colonies, eachcarrying a different I.M.A.G.E. consortium EST plasmid(www-bio.llnl.gov/bbrp/image/image.html), spotted twice on a 22×22 cmmembrane (Genome Systems). The Genome Systems arrays are advantageous inthat they contain by far the largest number of ESTs per unit cost. RNAfingerprinting for target preparation.

[0230] RAP-PCR amplifications were performed to look for differentialgene expression in keratinocytes (HaCaT) when treated with EGF or TGF-βfor four hours (Boukamp et al., supra, 1997). These experiments weredesigned to detect genes differentially regulated by EGF and TGF-βtreatment in confluent keratinocytes. Using RAP-PCR, about 1% of thegenes in normal or immortal keratinocytes responded to EGF, and fewerresponded to TGF-β in this time frame.

[0231] Shown in FIG. 2 are RAP-PCR fingerprints of RNA from confluentkeratinocytes treated with TGF-β or EGF using multiple RNAconcentrations and two sets of arbitrarily chosen primers. Reversetranscription was performed with an oligo-dT primer on 250, 125, 62.5and 31.25 ng RNA in lanes 1, 2, 3, and 4, respectively. RNA was fromuntreated, TGF-β treated or EGF treated HaCaT cells, as indicated.RAP-PCR was performed with two sets of primers, GP14 and GP16 (Panel A)or Nucl+ and OPN24 (Panel B). The sizes of the two differentiallyamplified RAP-PCR products are indicated with arrows (317 and 291nucleotides).

[0232] In the first fingerprint shown in FIG. 2A, two differentiallyregulated products were detected, which were cloned and sequenced. Thesizes of these two products, 291 and 317 nucleotides, are indicated witharrows (see FIG. 2A). The Genome Systems arrays used were chosen basedon the presence of these two clones. This fingerprint was used todemonstrate that differentially regulated genes in an array can beidentified without isolating, cloning and sequencing the RAP-PCRproducts. The fingerprint shown in FIG. 2A and the second fingerprintshown in FIG. 2B, which displayed no differential regulation in responseto the treatments, were also used to demonstrate that fainterdifferentially regulated products not visible on the fingerprint gelcould, nevertheless, be observed by the array approach.

[0233] The results obtained were highly reproducible. Using gelelectrophoresis, there were no differences among the ˜100 bands visiblein any of the fingerprints from a single treatment condition performedat different RNA concentrations (see FIG. 2). Similarly, more than 99%of the top 1000 clones hybridized by the targets derived from thefingerprint in FIG. 2A were visible at both input RNA concentrations.Furthermore, more than 98% of the products were the same between the twotreatment conditions, plus and minus EGF, at a single RNA concentration.These results indicated high reproducibility among the top 1000 PCRproducts in the RAP-PCR amplification.

[0234] The untreated control and EGF-treated samples were furthercharacterized. RAP-PCR fingerprints shown in FIG. 2 were converted intohigh specific activity radioactive targets by random primed synthesisusing α-(³²P)-dCTP as described above. For each of the two conditions,EGF treated and untreated, fingerprints generated from RNA at twodifferent concentrations were converted to target by random primedsynthesis for each of the two different fingerprinting primer pairs.These radioactively labeled fingerprint targets were then used to probeby hybridizing to a set of identical arrays each containing 18,432I.M.A.G.E. consortium cDNA clones. As controls, total genomic DNA andtotal poly(A)⁺ mRNA were also labeled by random priming, as describedabove, and used as targets on identical arrays.

[0235] The RAP-PCR fingerprint targets, the total mRNA target and thegenomic target were hybridized individually against replicates of aGenome Systems colony array. Genomic DNA was used as a blocking agentand as a competitor for highly repetitive sequences. Washing at 68° C.in 0.1×SSC and 0.1% SDS removed virtually all hybridization to known Aluelements on the membrane, presumably because Alu elements aresufficiently diverged from each other at this wash stringency.

[0236] Shown in FIG. 3 are autoradiograms from the same half of eachmembrane. All images presented are autoradiograms of the bottom half ofduplicates of the same filter (Genome Systems) probed by hybridizationwith radiolabeled DNA. Panels A and B show hybridization of two RAP-PCRreactions generated using the same primers (GP14 and GP16) and derivedfrom untreated (Panel A) or EGF treated (Panel B) HaCaT cells. Threedouble-spotted clones that show differential hybridization signals aremarked on each array. The GenBank Accession numbers of the clone and thecorresponding genes are H10045 and H10098, corresponding to vav-3 andAF067817 (square)(Katzav et al., EMBO J. 8:2283-2290 (1989); H28735,gene unknown, similar to heparan sulfate 3-O-sulfotransferase-1,AF019386 (circle)(Shworak et al., J. Biol. Chem. 272:28008-28019 (1997);and R48633, gene unknown (diamond).

[0237]FIG. 3 shows the results of hybridization of targets from thesefingerprints to the arrays. As shown in FIGS. 3A and 3B, arrayed clonescorresponding to the 291 nucleotide (vav-3, marked by square) and 317nucleotide (similar to heparin sulfate N-sulfotransferase (N-HSST),marked by circle) RAP-PCR fragments are indicated. The sequences ofthese RAP-PCR fragments were determined. Also indicated on this array isa differentially regulated gene that could not be visualized on theoriginal fingerprint gel (marked by diamond).

[0238] Comparing FIGS. 3A and 3B, a more than 10-fold down-regulationwas observed for vav-3 upon treatment with EGF. The gene correspondingto H28735 was up-regulated more than 10-fold with EGF treatment. Thegene corresponding to R48633 was up-regulated about 3-fold with EGFtreatment. These changes in gene expression in response to EGF wereindependently confirmed by RT-PCR.

[0239] These results indicate that RAP-PCR samples a population of mRNAslargely independently of message abundance. This is because the lowabundance class of messages has much higher complexity than the abundantclass, making it more likely that the arbitrary primers will find goodmatches. Unlike differential display, RAP-PCR demands two such arbitrarypriming events, possibly biasing RAP-PCR toward the complex class.Overall, these data suggest that the majority of the mRNA population ina cell (<20,000 mRNAs) can be found in as few as ten RAP-PCRfingerprints. This result indicates that differential gene regulationcan be detected by the combined fingerprinting and array approach evenwhen the event cannot be detected using the standard gel electrophoresisapproach.

[0240]FIG. 3C shows an array hybridized with a RAP-PCR target using thesame RNA as in panel A but with a different pair of primers, Nucl+ andOPN24. As shown in FIG. 3C, using a different set of primers yields anentirely different pattern of hybridizing genes. FIG. 3D shows an arrayhybridized with a cDNA generated by reverse transcription of 1 μgpoly(A)⁺-selected mRNA. FIG. 3E shows an array hybridized with humangenomic DNA labeled using random priming.

[0241] The data were analyzed in a number of ways. First, estimates weremade of the overlap between the clones hybridized by each target. In allpairwise comparisons between all of the different types of targets,there was less than 5% overlap among the 500 clones that hybridized mostintensely (compare FIGS. 3A, 3B, 3D, and 3E). Of the top 500 cloneshybridized by the genomic target, which included nearly all clones knownto contain the Alu repeats, less than 5% overlapped with the top 500clones hybridized by the fingerprint targets or the total poly(A)⁺ mRNAtarget. This indicated that, except for the case of a genomic target,there was no significant hybridization to dispersed repeats. The overlapamong the clones hybridized by the two RAP-PCR fingerprints generatedwith different primers was less than 3%, and the overlaps of eitherfingerprint with the poly(A)⁺ mRNA target were both less than 3%. Thus,most of the cDNAs detected using a target from the fingerprints couldnot be detected using the total mRNA target. These results indicate thatRAP-PCR samples a population of mRNAs largely independently of messageabundance. This is because the low abundance class of messages has muchhigher complexity than the abundant class, making it more likely thatthe arbitrary primers will find good matches. Unlike differentialdisplay, RAP-PCR demands two such arbitrary priming events, possiblybiasing RAP-PCR toward the complex class. Overall, these data suggestthat the majority of the mRNA population in a cell (<20,000 mRNAs) canbe found in as few as ten RAP-PCR fingerprints.

[0242] A total of 30 differentially hybridizing cDNA clones weredetected among about 2000 hybridizing colonies using targets derivedfrom both sets of arbitrary primers (FIG. 2) at a threshold of aboutthree-fold differential hybridization. Twenty-two of thesedifferentially hybridizing clones displayed differential hybridizationat both RNA concentrations. These 22 were further characterized byRT-PCR. Differentially expressed genes exhibiting greater than atwo-fold difference in expression in response to EGF treatment are shownin Table 1. For the results shown in Table 1, differential expressionwas confirmed by low stringency RT-PCR. The left column gives theaccession numbers of the EST clones (5′ or 3′, or both when available).The right column gives the corresponding gene or the closest homolog. Incases of very low homologies, the gene is considered unknown. The cutofffor homology was p<e-20 in tblastx. TABLE 1 Genes Regulated More thanTwo-fold After EGF Treatment of HaCaT Keratinocytes. Accession numberGene name Up-regulated H11520 (3′) unknown H11161 (5′)/H11073 (3′)TSC-22 (U35048) R48633 (5′) unknown H28735 (3′) similar to heparansulfate 3-O- sulfotransferase-1 precursor (AF019386) H25513 (5′)/H25514(3′) Fibronectin receptor α-subunit (M13918) H12999 (5′)/H05639 (3′)similar to Focal adhesion kinase (FAK2) (L49207) H15184 (5′)/H15124 (3′)ray gene (X79781) H25195 (5′)/H24377 (3′) X-box binding protein-1(XBP-1) (M31627) H23972 (‘ ’) unknown H27350 (5′) CPE-receptor (hCPE-R)(AB000712) R75916 (5′) similar to semaphorin C (X85992) Down-regulatedR73021 (5′)/R73022 (3′) epithelium-restricted Ets protein ESX (U66894)H10098 (5′)/H10045 (3′) vav-3 (AF067817)

[0243] The eight false-positive clones that appeared to be regulated atonly one concentration were further characterized. Of these eight, fivefalse-positive clones showed differential hybridization at oneconcentration but were present and not regulated on the membranes forthe other concentration. The most likely source of this type offalse-positive is the membranes. Although each clone is spotted twice,it is possible that occasionally one membrane received substantiallymore, or less, DNA in both spots than the other three membranes forthese clones. However, this potential difference was easily detected andis rare, occurring only five times in over 2000 clones. The other threefalse-positive clones hybridized under only one treatment condition andat only one RNA concentration used for RAP-PCR. These threefalse-positive clones could be differentially expressed genes or couldbe false-positives from variable PCR products. However, the number offalse positives was very low and were easily identified by comparing theresults of two targets derived from PCR of different startingconcentrations of RNA.

[0244] Differential expression was confirmed using low stringencyRT-PCR. Only those hybridization events that indicated differentialexpression at both input RNA concentrations were further characterized.For confirmation of differential expression, RT-PCR was used withspecific targets rather than Northern blots, which are much lesssensitive than RT-PCR, because it was expected that many of the mRNAswould be rare and in low abundance. One of the advantages of using thearrays from the I.M.A.G.E. consortium is that more than 70% of theclones have single pass sequence reads from the 5′ or 3′ end, or both,deposited in the GenBank database.

[0245] Clones for which some sequence is available in the database werechosen for further characterization. Five of the 22 ESTs representingdifferentially regulated genes on the array had not been sequenced andtwo of the remaining 17 ESTs were from the same gene. The remaining 15unique sequenced genes were aligned with other sequences in the databasein order to derive a higher quality sequence from multiple reads andlonger sequence from overlapping clones. The UniGene database clustershuman and mouse ESTs that appear to be from the same gene (Schuler, J.Mol. Med. 75:694-698 (1997)). This database greatly aids in the processof assembling a composite sequence from different clones of the samemRNA (http://www.ncbi.nlm.nih.gov/UniGene/index.html). These compositesequences were then used to choose primers for RT-PCR.

[0246] For each gene, two specific primers were used in RT-PCR under lowstringency conditions similar to those used to generate RAP-PCRfingerprints. In addition to the product of interest, a pattern ofarbitrary products was generated, which is largely invariant and behavesas an internal control for RNA quality and quantity, and for reversetranscription efficiency (Mathieu-Daude et al., supra, 1998). The numberof PCR cycles was adjusted to between 14 to 25 cycles, according to theabundance of the product, in order to preserve the differences instarting template mRNA abundances. This is necessary becauserehybridization of abundant products during the PCR inhibits theiramplification, and the difference in product abundances diminishes asthe number of PCR cycles increases (Mathieu-Daude et al., Nucleic AcidsRes. 24:2080-2086 (1996)).

[0247] Low stringency RT-PCR experiments confirmed the differentialexpression of the two transcripts that were identified in the RAP-PCRfingerprints of FIG. 2A and showed differential hybridization to thecDNA array (compare FIG. 3A versus 3B). One of these differentiallyexpressed genes corresponds to a new family member of the vavprotooncogene family (Katzav et al., supra, 1989; Katzav, Crit. Rev.Oncog. 6:87-97 (1995); Bustelo, Crit. Rev. Oncog. 7:65-88 (1996); Romeroand Fischer, Cell Signal. 8:545-553 (1996)). The other differentiallyexpressed gene has homology to heparan sulfate 3-O-sulfotransferase-1(Shworak et al., supra, 1997).

[0248] The other 13 differentially expressed were also tested and 11were confirmed using low stringency RT-PCR. Some of the differentiallyexpressed genes are shown in FIG. 4. Reverse transcription was performedat two RNA concentrations (500 ng, left column; 250 ng, right column).The reaction was diluted 4-fold in water and one fourth was used for lowstringency RT-PCR at different cycle numbers. The RT-PCR products wereresolved on polyacrylamide-urea gels. Shown are bands for the control(22 cycles); for GenBank accession number H11520 (22 cycles); forTSC-22, corresponding to GenBank accession numbers H11073 and H11161 (19cycles) (Jay et al., Biochem. Biophys. Res. Commun. 222:821-826 (1996);Dmitrenko et al., Tsitol. Genet. 30:41-47 (1996); Ohta et al., Eur. J.Biochem. 242:460-466 (1996)); and for GenBank accession number R48633(19 cycles). Genes corresponding to H11520 and TSC-22 are up-regulatedabout 8-10 fold with EGF treatment. The gene corresponding to R48633 isup-regulated about 3-fold with EGF treatment.

[0249] Of the two differentially expressed genes that were notconfirmed, one proved unamplifiable. The other gene gave a product butappeared to not be differentially regulated when analyzed by RT-PCR.

[0250] RAP-PCR targets were very effective at detecting rare, lowabundance mRNAs. Each fingerprint hybridized to a set of clones almostentirely different from the set hybridized by a target derived frompoly(A)⁺-selected mRNA (see FIG. 3). In addition, numerous other primerpairs, membranes, and sources of RNA consistently showed less than a 5%overlap between clones hybridized by any two fingerprints, or between afingerprint and a total poly(A)⁺-selected cDNA target. Detection ofdifferentially expressed vav-3 mRNA, which is a new member of the vavoncogene family, was attempted using a Northern blot ofpoly(A)⁺-selected RNA. Despite being able to detect serially dilutedvector down to the equivalent of a few copies per cell, vav-3 mRNA wasundetectable on the Northern blot, whereas RT-PCR confirmed expression.A G3PDH control was used to confirm that the conditions used in theNorthern blot could detect a control gene. Therefore, vav-3 appears tobe a low abundance message that is represented in a RAP-PCR fingerprintas a prominent band.

[0251] The frequency of homologs of cDNAs detected by the RAP-PCRtargets in the EST database was determined (>98% identity). This wascompared to the frequency of homologs for a random set of other cDNAs onthe same membrane. If the RAP-PCR fingerprints were heavily biasedtowards common mRNAs, then many would occur often in the EST databasebecause it is partly derived from cDNA libraries that are not normalizedor incompletely normalized. However, the cDNAs detected by RAP-PCR hadfrequencies in the EST database comparable to the frequencies forrandomly selected cDNAs, including cases where the clone was unique inthe database. These results indicate that sampling by arbitrarilysampled targets generated by RAP-PCR is at least as good as randomsampling of the partly normalized libraries used to construct the array,and very different from that obtained for a target such as total mRNAtarget.

[0252] These results demonstrate that an arbitrarily sampled targetgenerated using RT-PCR and arbitrary primers can detect genesdifferentially expressed in response to EGF.

EXAMPLE III An Arbitrarily Sampled Target Generated by DifferentialDisplay Detects Genes Differentially Expressed in Response to EGF

[0253] This example shows the use of differential display to generate anarbitrarily sampled target and detection of differentially expressedgenes responsive to EGF.

[0254] RNA was prepared from the human keratinocyte cell line HaCaT asdescribed in Example II. Briefly, cells were grown to confluence andmaintained at confluence for 2 days. The medium was changed 1 day priorto the experiment. EGF (Gibco-BRL) was added at 20 ng/ml. Treated anduntreated cells were harvested after 4 hrs and total RNA was preparedwith the RNEASY total RNA purification kit (Qiagen) according to themanufacturer's protocol. To remove remaining genomic DNA, the extractedtotal RNA was treated with RNase-free DNase (Boehringer Mannheim) andcleaned again using the RNEASY kit. The purified RNA was adjusted to 400ng/μl in water and checked for quality by agarose gel electrophoresis.

[0255] For standard differential display, differential display wasperformed using the materials supplied in the RNAIMAGE kit (GenHunterCorporation; Nashville Tenn.), AMPLITAQ DNA polymerase(Perkin-Elmer-ABI; Foster City Calif.) and α-(³²P)-dCTP according to themanufacturer's protocol, except that each RNA template was used at fourdifferent concentrations, 800, 400, 200 and 100 ng per 20 μl reaction,with each anchored oligo(dT) primer (0.2 μM). The PCR reaction contained2 μM dNTPs, for a total of 4 μM, including the carryover from the cDNAmix, 0.2 μM each primer, and one tenth of the newly synthesized cDNA,corresponding to 80, 40, 20 and 10 ng RNA. The anchored oligo(dT)primers were used in all possible combinations with four differentarbitrary primers. The anchored oligo(dT) primers used were H-T₁₁G(HTTTTTTTTTTTG; SEQ ID NO:); H-T₁₁A (HTTTTTTTTTTTA; SEQ ID NO:); andH-T₁₁C (HTTTTTTTTTTTC; SEQ ID NO:), where H is AAGC, which is anarbitrary sequence used as a clamp to ensure the primers stay inregister and have a high Tm at subsequent PCR steps. The arbitraryprimers used were H-AP1 (AAGCTTGATTGCC; SEQ ID NO:); H-AP2(AAGCTTCGACTGT; SEQ ID NO:); H-AP3 (AAGCTTTGGTCAG; SEQ ID NO:); andH-AP4 (AAGCTTCTCAACG; SEQ ID NO:).

[0256] For modified differential display, reverse transcription wasperformed using four different concentrations of each RNA template,1000, 500, 250 and 125 ng per 10 μl reaction. The reaction mix contained1.5 μM oligo(dT) anchored primers AT₁₅A, GT₁₅G, and T₁₃V, 50 mM Tris, pH8.3, 75 mM KCl, 3 mM MgCl₂, 20 mM DTT, 0.2 mM each dNTP, 8 U RNaseinhibitor (Boehringer Mannheim) and 20 U MuLV reverse transcriptase(Promega). The anchored primers were AT₁₅A (ATTTTTTTTTTTTTTTA; SEQ IDNO:); GT₁₅G (GTTTTTTTTTTTTTTTG; SEQ ID NO:); and T₁₃V (TTTTTTTTTTTTTV;SEQ ID NO:; where V is A, G or C)). The reaction mix was ramped for 5min from 25° C. to 37° C., held at 37° C. for 1 hr, and finally theenzyme was inactivated at 94° C. for 5 min. The newly synthesized cDNAwas diluted 4-fold in water.

[0257] The PCR was performed after adding 10 μl of reaction mix to 10 μlof the diluted cDNAs, corresponding to 250, 125, 62.5 and 31.25 ng ofRNA, to yield a 20 μl final reaction volume containing 2 μM anchoredoligo(dT) primer, 0.4 μM arbitrary primer, either KA2 (GGTGCCTTTGG; SEQID NO:) or OPN28 (GCACCAGGGG; SEQ ID NO:), 2.5 units AMPLITAQ DNApolymerase Stoffel fragment (Perkin Elmer-ABI), 2 μCi α-(³²P)-dCTP, 175μM each dNTP, 10 mM Tris, pH 8.3, 10 mM KCl, and 3.125 mM MgCl₂. Theseconcentrations do not include the carryover from the reversetranscription reaction. The reactions were thermocycled for 35 cycles of94° C. for 40 sec, 40° C. for 1 min and 40 sec, and 72° C. for 40 sec.

[0258] An aliquot of the PCR products resulting from the four differentconcentrations of the same RNA template were displayed side by side on a5% polyacrylamide gel and visualized by autoradiography as described inExample II.

[0259] For labeling of differential display products for use as targetsto probe arrays, random primed labeling of the differential displayproducts was performed as described in Example II. The differentialdisplay PCR reactions (14 pl) were purified using a QIAQUICK PCRPurification Kit (Qiagen) and the DNA was recovered in 50 μl 10 mM Tris,pH 8.3. Random primed synthesis was performed using a standard protocol.Briefly, 5 μl of the recovered differential display products werecombined with 3 μg random hexamers, boiled for 3 min and placed on ice.The hexamer/DNA mix was combined with the reaction mix to yield a 25 μlreaction containing 0.05 mM three dNTPs (minus dCTP), 50 μCi of 3000Ci/mmol α-(³²P)-dCTP, 1× Klenow fragment buffer, and 4 U Klenow fragment(Gibco-BRL). The reaction was performed at room temperature for 4 hrs,chased for 15 min at room temperature by adding 1 μl of 1.25 mM dCTP,and incubated for an additional 15 min at 37° C. The unincorporatednucleotides and hexamers were removed with the Qiagen Nucleotide RemovalKit and the purified products were eluted using two aliquots of 140 μl10 mM Tris, pH 8.3.

[0260] Hybridization to the array was performed essentially as describedin Examples I and II. Briefly, the cDNA membranes (Genome Systems) wereprewashed in three changes of prewash solution, containing 2×SSC and0.1% SDS, in a horizontally shaking flat bottom container to reduce theresidual bacterial debris. The first wash used 500 ml of prewash bufferfor 10 min at room temperature. The second and third washes were eachcarried out in 1 liter of prewash solution, prewarmed to 55° C., for 10min.

[0261] The membranes were transferred to large roller bottles andprehybridized in 60 ml prehybridization solution, prewarmed to 42° C.,containing 6×SSC, 5× Denhardt's reagent, 0.5% SDS, 100 μg/ml fragmented,denatured salmon sperm DNA, and 50% formamide for 1-2 hrs at 42° C.

[0262] The prehybridization solution was removed, and 10 mlhybridization solution, prewarmed to 42° C. and containing 6×SSC, 0.5%SDS, 100 μg/ml fragmented, denatured salmon sperm DNA and 50% formamide,was added to the bottles. To decrease the background hybridization dueto repeats such as Alu and Line elements, sheared human genomic DNA wasdenatured in a boiling water bath for 10 min and immediately added tothe hybridization solution to a final concentration of 10 μg/ml. Analiquot of 10 ng/ml poly(dA) was added to block oligo (dT) stretches inthe radiolabeled target. Simultaneously, the labeled target wasdenatured in a boiling water bath for 4 min and immediately added to thehybridization solution. The hybridizations-were carried out at 42° C.for 18-20 hrs.

[0263] Following hybridization, the hybridization solution was pouredoff and the membranes were thoroughly washed in six changes of washsolution, including a transfer of the membranes from the roller bottlesto a horizontally shaking flat bottom container and back to the rollerbottles, over 2-3 hrs. The stringency of the washes was increasedstepwise from 2×SSC and 0.1% SDS at room temperature to 0.1×SSC and 0.1%SDS at 64° C. The separate washes were maintained at exactly the sameindicated temperatures for all of the membranes. The last highstringency wash was at least 40 min to ensure exactly equilibratedtemperatures in all bottles. The final wash solution was removed, andthe membranes were briefly rinsed in 2×SSC at room temperature, blottedwith 3MM paper, wrapped in SARAN wrap while moist, and placed againstKodak Biomax film (Eastman-Kodak; Rochester, N.Y.).

[0264] Differential expression was confirmed using low stringencyRT-PCR. The first level of confirmation was the use of two RNAconcentrations per sample. Only those hybridization events thatindicated differential expression at both RNA concentrations in both RNAsamples were further characterized.

[0265] Nucleotide sequences, which were available from Genome Systems,the commercial source of the array, or were sequenced, were used toderive PCR primers of 18 to 25 bases in length using MacVector 6.0(Oxford Molecular Group). Generally, primers were chosen that generatePCR products of 100 to 250 base pairs, have melting temperatures of atleast 60° C., and were preferably located close to the polyadenylationsite of the mRNA so as to reduce the chance of sampling family members.

[0266] Reverse transcription was performed on total RNA using two RNAconcentrations per sample and an oligo-(dT₁₅) primer (TTTTTTTTTTTTTTT;SEQ ID NO:; Genosys). The reactions contained 100 and 50 ng per litertotal RNA, 0.5 μM oligo-(dT₁₅) primer (SEQ ID NO:), 50 mM Tris, pH 8.3,75 mM KCl, 3 mM MgCl₂, 20 mM DTT, 0.2 mM of each dNTP, 0.8 U/μl RNaseinhibitor (Boehringer Mannheim) and 2 U/μl of MuLV-reverse transcriptase(Promega). The reactions were ramped for 5 min from 25° C. to 37° C. andheld at 37° C. for 1 hr. The enzyme was inactivated by heating thereactions at 94° C. for 5 min and the newly synthesized cDNA was diluted4-fold in water.

[0267] Diluted cDNAs (10 μl) were mixed with 2×PCR mixture containing 20mM Tris, pH 8.3, 20 mM KCl, 6.25 mM MgCl₂, 0.35 mM of each dNTP, 3 μM ofeach specific primer, 2 μCi α-(³²P)-dCTP (ICN, Irvine, Calif.) and 2 UAMPLITAQ DNA polymerase Stoffel fragment (Perkin-Elmer-Cetus) for a 20μl final reaction volume. A low stringency thermal profile was used: 94°C. for 40 sec, 40° C. for 40 sec, and 72° C. for 1 min, for 17 and 19cycles in separate tubes. The reaction was carried out in two sets oftubes at different cycle numbers because the abundance of thetranscripts, the performance of the primer pairs and the amplifiabilityof the PCR products can vary. PCR products were run under the sameconditions as described above on a 5% polyacrylamide and 43% urea gel.The gel was dried and placed for 18 to 72 hours on a phosphoimagerscreen and read with a STORM phosphoimager (Molecular Dynamics;Sunnyvale Calif.). Invariance among the other arbitrary products in thefingerprint was used as an internal control to indicate the reliabilityof the relative quantitation. The gene-specific products from four setsof reactions per differentially regulated gene were quantitated usingIMAGEQUANT Software (Molecular Dynamics).

[0268] Primer pairs were used to confirm differential expression.

[0269] For GenBank accession number R72714 (Egr-1)(155 nt product);primer A, CACGTCTTGGTGCCTTTTGTGTG (SEQ ID NO:); primer B,GAAGCTCAGCTCAGCCCTCTTCC (SEQ ID NO:).

[0270] For GenBank accession number H14529 (ACTB, β-actin)(174 ntproduct); primer A, CCAGGGAGACCAAAAGCCTTCATAC (SEQ ID NO:); primer B,CACAGGGGAGGTGATAGCATTGC (SEQ ID NO:).

[0271] For GenBank accession number H27389 (A+U-rich element RNA bindingfactor)(144 nt product); primer A, GTGCTTTTCAAAGATGCTGCTAGTG (SEQ IDNO:); primer B, GCTCAATCCACCCACAAAAACC (SEQ ID NO:).

[0272] For GenBank accession number H05545 (protein phosphatase 2Acatalytic subunit)(141 nt product); primer A, TCCTCTCACTGCCTTGGTGGATG(SEQ ID NO:); primer B, CACAGCAAGTCACACATTGGACCC (SEQ ID NO:).

[0273] For GenBank accession number H27969 (103 nt product); primer A,CCAAAGACATTCAGAGGCATGG (SEQ ID NO:); primer B, GAGGTGGGGAAGGATACAGCAG(SEQ ID NO:).

[0274] For GenBank accession number R73247 (inositol tris phosphatekinase)(168 nt product); primer A, GAAAAGGGTTGGGGAGAAGCCTC (SEQ ID NO:);primer B, TCTCTAGCGTCCTCCATCTCACTGG (SEQ ID NO:).

[0275] For GenBank accession number H21777 (α-tubulin isoform 1) (155 ntproduct); primer A, ACAACTGCATCCTCACCACCCAC (SEQ ID NO:); primer B,GGACACAATCTGGCTAATAAGGCGG (SEQ ID NO:).

[0276] Total RNA was obtained from immortalized HaCaT keratinocytes,treated and untreated with EGF, as described in Example II (Boukamp etal., supra, 1997). The first differential display protocol tried was theRNAimage kit 1 (cut G50′; GenHunter. The anchor primers, oligo(dT)-G(H-T₁₁G; SEQ ID NO:), oligo(dT)-C (H-T₁₁C; SEQ ID NO:) or oligo(dT)-A(H-T₁₁A; SEQ ID NO:), were used for reverse transcription, and then eachcDNA was used for PCR in combination with four different arbitraryprimers, H-AP1 (SEQ ID NO:), H-AP2 (SEQ ID NO:), H-AP3 (SEQ ID NO:) andH-AP4 (SEQ ID NO:).

[0277] As shown in FIG. 5, the fingerprints were resolved on adenaturing acrylamide gel to determine the quality of the reactions.Differential display reactions were performed using the RNAIMAGE kitprotocol (GenHunter Corporation) according to the manufacturer'ssuggestion except that four different starting concentrations of 800,400, 200 and 100 ng of total RNA were used. One tenth of this materialwas then used for PCR. The anchored oligo(dT) primer H-T₁₁C (SEQ ID NO:)was used with two different arbitrary primers, H-AP3 (SEQ ID NO:) andH-AP4 (SEQ ID NO:), as indicated. The arbitrary primer H-AP4 (SEQ IDNO:) was used with two different anchored oligo(dT) primers, H-T₁₁C (SEQID NO:) and H-T₁₁A (SEQ ID NO:). The reactions that share either thearbitrary primer or the anchored oligo(dT) primer showed almost novisible overlap in the visible bands.

[0278]FIG. 5B shows differential display using a different set ofprimers. Differential display was performed using the arbitrary primerKA2 (SEQ ID NO:) with three different anchored oligo(dT) primers, T₁₃V(SEQ ID NO:), AT₁₅A (SEQ ID NO:), and GT₁₅G (SEQ ID NO:), as indicated.The differential display protocol was adjusted to yield more mass and ahigher complexity of the generated products. The starting concentrationsof RNA were 1000, 500, 250 and 125 ng. One fourth of this material wasthen used for PCR. As observed in FIG. 5A, using different oligo(dT)anchored primers changes the pattern of the displayed bands almostentirely.

[0279] The fingerprints generated about 30 to 50 clearly visibleproducts (see FIG. 5A). Fingerprints were generally reproducible in therange from 100 to 800 ng of total mRNA used in these experiments, withvery few RNA concentration dependent products. Three of the mostreproducible fingerprints that shared either a oligo(dT) anchored primeror an arbitrary primer (FIG. 5A) were radiolabeled by random priming inthe presence of three unlabeled dNTPs and α-(³²P)-dCTP, and each wasused to probe identical arrays of 18,000 double spotted E. coli coloniescarrying ESTs from the I.M.A.G.E. consortium. The arrays were hybridizedand washed as described above.

[0280] The kit protocol used 0.2 μM of the arbitrary primer and 4 μMdNTPs compared to 1 μM primers and 200 μM dNTPs used in the RAP-PCRprotocol described in Example II. The fingerprint reaction containedless than 40 ng of product in 20 μl, presumably because of limitingcomponents. This was about five times less DNA than used in the methoddescribed in Example II. For this reason, it took about ten days with anintensifying screen in order to obtain an adequate exposure of X-rayfilm. Approximately 500 products were easily discernible with eachtarget after a sufficient exposure. The number of reliably observablegenes is usually increased by at least two-fold or more when using aphosphoimager screen, indicating the greater sensitivity ofphosphoimaging compared to X-ray film. Furthermore, pooling of separatelabeled fingerprints into the same target can increase throughput evenfurther.

[0281] In order to reduce the exposure time for target hybridization toarrays, experiments were performed at the higher concentration of primerand dNTPs described in Example II using RAP-PCR protocols (FIG. 5B).These experiments yielded the expected increase in product mass and acorresponding reduction in exposure times for arrays.

[0282] The selectivity of oligo(dT) primers was determined usingdifferent anchor bases. As shown in FIG. 6, differential displayreactions were hybridized to cDNA arrays. The differential displayproducts generated as described in FIG. 5A, with the primers GT₁₅G (SEQID NO:) and KA2 (SEQ ID NO:) from untreated (FIG. 6A) and EGF treated(FIG. 6B) HaCaT cells, were labeled by random priming and hybridized tocDNA arrays. A section representing less than 5% of a membrane is shownwith a differentially regulated gene indicated by an arrow. FIG. 6Cshows hybridization of differential display products generated with theprimers AT₁₅A (SEQ ID NO:) and KA2 (SEQ ID NO:) from untreated HaCaTcells. Comparing FIG. 6A versus 6C, there is a significant overlap ofhybridization signals that were not obvious from the polyacrylamidedisplay (compare to FIG. 5B, lanes AT₁₅A/KA2 versus GT₁₅G/KA2).

[0283] When the arbitrary primer was changed while keeping the sameanchor primer, the pattern of clones hybridized changed almost entirely,with typically less than 5% overlap between any two fingerprints. Incontrast, targets containing the same arbitrary primer and differentanchored primers shared about 30% of the clones to which theyhybridized. FIGS. 6A and 6C show examples of such shared products from asmall portion of an array.

[0284] Similar observations were made using fingerprints generated undera wide variety of conditions, including the protocols and primers fromthe GenHunter kit, modified protocols, and protocols using primersindependent of those in the GenHunter kit. The possibility of thisoverlap being due to repeats was excluded by the use of genomic andtotal mRNA targets against the same membranes.

[0285] The overlap among targets that had different anchored primers butshared the same arbitrary primer was not reflected in any noticeablesimilarity in the fingerprint products when resolved on a denaturingpolyacrylamide gel. For example, the targets used in FIGS. 6A and 6C areshown in FIG. 5B and show no easily discerned similarities, despitehaving 30% of the products in common. Many of the shared products wereamong the most intensely hybridizing clones on the array. Therefore,some of the products visible on the gel could share the arbitrary primerat one end but, during PCR, the products are preferentially primed atmultiple different locations in the opposite direction by the differentanchored primers. This would result in fingerprints that had little orno similarity in a polyacrylamide display while being compatible withthe observation that targets with the same arbitrary primer butdifferent anchored primers overlap by 30% in the clones to which theyhybridize.

[0286] Shared products are a general phenomenon for anchoredfingerprints that share an arbitrary primer under a fairly wide range ofconditions. Overlap among fingerprints can be avoided by not using thesame arbitrary primer with different anchored primers.

[0287] Comparison of the pattern of hybridizing clones with thatgenerated by total genomic DNA indicated that the clones hybridizing toa target generated by the GenHunter fingerprint did not generallycontain the Alu repetitive element that occurs in a few percent of mRNA3′ untranslated regions (UTRs). The clones hybridized by the target didnot overlap significantly with clones hybridized by a total cDNA targetderived from reverse transcription of poly(A)⁺ mRNA, indicating that thegenes sampled were not heavily biased towards the most abundant RNAs.These results are consistent with results obtained using only arbitraryprimers for fingerprinting (see Example II) and indicate that arbitrarypriming combined with anchored oligo(dT) priming can be used to monitorrare genes in cDNA arrays. These results also confirm that RAP-PCR anddifferential display are not heavily biased toward abundant transcripts.

[0288] Among over 2000 clones surveyed for differential gene expressionbetween untreated and EGF treated HaCaT cells, there were 29 differentclones that appeared to clearly reflect differential expression at oneRNA concentration. The 12 clones having the highest signal to noiseratio and differential expression ratio were chosen and specific primerswere designed for RT-PCR. An example of one of these differentiallyexpressed genes is indicated by an arrow in FIG. 6A versus 6B.

[0289] Differential expression of at least 1.5-fold was confirmed forseven genes, which are shown in FIG. 7. Reverse transcription wasperformed at twofold different RNA concentrations. The reactions werediluted 4 fold in water and low stringency PCR was performed atdifferent cycle numbers. The amount of input RNA/cDNA for each PCRreaction was 125 ng, left column and 250 ng, right column. The reactionsshown in FIG. 7 were carried out for 10 cycles and resolved onpolyacrylamide-urea gels. Shown are products for the control(unregulated) and genes differing by at least 1.6-fold. The regulatedgenes shown correspond to GenBank accession numbers R72714, H14529,H27389, H05545, H27969, R73247, and H21777.

[0290] The regulation of the genes shown in FIG. 7 are summarized inTable 2. Identified genes regulated by four hr treatment with EGF,corresponding GenBank accession numbers, and the fold-increase inexpression relative to untreated cells are shown. TABLE 2 EGF RegulatedGenes. Fold Up-regulation Gene Accession # by EGF EGR1 R72714, X525418.3 ± 3.4 ACTB, beta-actin H14529, M10277 2.0 ± 0.3 A + U-rich elementRNA H27389, D89092, 1.9 ± 0.3 binding factor D89678 Protein phosphatase2A H05545, J03804 1.6 ± 0.4 catalytic subunit Unknown D31765, H27969 1.6± 0.4 Inositol tris phosphate R73247, U51336 1.6 ± 0.3 kinaseAlpha-tubulin isoform 1 H21777, K00558 1.6 ± 0.3

[0291] Egr-1 was previously known to be differentially regulated by EGFin other cell types (Iwami et al., Am. J. Physiol. 270:H2100-H2107(1996); Kujubu et al., J. Neurosci. Res. 36:58-65 (1993); Cao et al., J.Biol. Chem. 267:1345-1349 (1992); Ito et al., Oncogene 5:1755-1760(1990)). The observations of changes in β-actin and α-tubulin expressionare likely associated with the dramatic change in morphology these cellsundergo after EGF treatment. Regulation of β-actin and α-tubulin genesby EGF has been observed in other cell types (Torok et al., J. CellPhysiol. 167:422-433 (1996); Hazan and Norton, J. Biol. Chem.273:9078-9084 (1998); Shinji et al., Hepatogastroenterology 44:239-244(1997); Ball et al., Cell Motil. Cytoskeleton 23:265-278 (1992)). Theseobservations independently validate the treatments and the method usedto detect differential expression. The regulation of protein phosphatase2A mRNA has not previously been observed but is consistent with the roleof this protein in transduction of the EGF signal (Chajry et al., Eur.J. Biochem. 235:97—102 (1996)). Similarly, the gene associated with themetabolism of inositol phosphates had not previously been shown to beregulated by EGF but such regulation is consistent with the previousobservation of increases in the compounds generated by this enzyme afterEGF treatment in another ectodermal cell type (Contreras, J. Neurochem.61:1035-1042 (1993)). Regulation of two other genes by EGF, an unknowngene, with GenBank accession number H27969, and an RNA binding protein,with GenBank accession number D89692, was not previously reported in anycell type. GenBank accesssion number D31765 corresponds to KIAA0061.

[0292] Five other genes were not confirmed to be regulated when RT-PCRwas used. The number of false positives can vary from experiment toexperiment and depends on the quality of the fingerprints and on thequality of the commercially available membranes. The number of falsepositives can be limited by using two RNA concentrations on arraysbefore confirmation by RT-PCR, as described in Example II. Theseexperiments involved only a single concentration because the primarypurpose was to determine the efficiency of coverage and overlap amongtargets made by the oligo(dT)-X anchored priming method. Nevertheless,over half of the differentially hybridizing clones observed at oneconcentration correspond to differentially expressed genes. When twoarray hybridizations were performed for each treatment at two differentinput template concentrations, the error rate was well below 10%.

[0293] These results demonstrate that an arbitrarily sampled targetgenerated using differential display and arbitrary primers can detectgenes differentially expressed in response to EGF.

[0294] Throughout this application various publications have beenreferenced. The disclosures of these publications in their entiretiesare hereby incorporated by reference in this application in order tomore fully describe the state of the art to which this inventionpertains.

[0295] Although the invention has been described with reference to theexamples provided above, it should be understood that variousmodifications can be made without departing from the spirit of theinvention. Accordingly, the invention is limited only by the claims.

1 85 1 395 DNA Homo sapiens unsure (82) unsure (115)..(116) unsure (314)unsure (350) unsure (359) unsure (383) 1 tttttttttt acaacaatgcagtcatttat ttattgagta tgtgcacatt atggtattat 60 tactatactg attatatttaanaagtgact tctaattaga aaatgtatcc aaaannaaaa 120 cagcagatat acaaaattaaagagacagaa gatagacatt aacagataag gcaacttata 180 cattgaggaa tccaaatccaatacatttaa acatttggga aatgaggggg acaaatggga 240 agccagatca aatttgtgtaaaactattca gtatgtttcc cttggcttca tgtctgagga 300 agggctctcc cttncaatgggggatggaca aactccaaat gccacacaan tgtttaacng 360 gtatactagg tttcacactgggnacggggg ttaaa 395 2 389 DNA Homo sapiens unsure (230) unsure (384) 2acacagcccc ccgcccagcc agcatcgcag ggcttcaggg accaaccgca tagctgccta 60tgcccccgca gaactggctg ctgcgtgtga actgaacaga cggagaagat gtgctaggga 120gaatctgcct ccacagtcac ccatttcatt gctcgctgcg aaagagacgt gagactgaca 180tatgccatta tctcttttcc agtattaaac actcatatgc ttatggcttn gagaaatttc 240ttagttgggt gaattaaagg ttaatccgag aattagcatg gatataccgg gtcctcatgc 300agcttggcag atatctgaga aatggtttaa ttcatgctca ggagctgtgt gccttttcca 360tcccttccgg gtcccttacc cctnacttt 389 3 465 DNA Homo sapiens unsure (384)unsure (411) unsure (445) 3 tttttttttt tatcaacatt tatatgcttt attgaaagttgacaagtgca acagttaaat 60 acagtgacac cttacaattg tgtagagaac atgcacagaaacatatgcat ataactacta 120 tacaggtgat atgcagaaac ccctactggg aaatccatttcattagttag aactgagcat 180 ttttcaaagt attcaaccag actcaattga aagacttcagtgaacaagga tttacttcag 240 cgtattcagg caggctagga tttcaggatt acacaaagtgaggtaactgt gccaaattct 300 taaaatttct ttagggtgtg ggtttttgtc atgtagcagtttttatgtgg atctattata 360 taaaagtcca cacctcctca gacngccaat ggaaacaacttaaatttcca ntctgttaca 420 acctaattgg taggttacag tcccnttttg ttacaaatggttaca 465 4 1718 DNA Homo sapiens 4 ggcacgaggg gatccgcatc tgcctgggatcatcaagccc tagaagctgg gtttctttaa 60 attagggctg ccgttttctg tttctccctgggctgcggaa agccagaaga ttttatctag 120 cttatacaag gctgctggtg ttccctctttttttccacga gggtgttttt ggctgcaatt 180 gcatgaaatc ccaatggtgt agaccagtggcgatggatct aggagtttac caactgagac 240 atttttcaat ttctttcttg tcatccttgctggggactga aaacgcttct gtgagacttg 300 ataatagctc ctctggtgca agtgtggtagctattgacaa caaaatcgag caagctatgg 360 atctagtgaa aagccatttg atgtatgcggtcagagaaga agtggaggtc ctcaaagagc 420 aaatcaaaga actaatagag aaaaattcccagctggagca ggagaacaat ctgctgaaga 480 cactggccag tcctgagcag cttgcccagtttcaggccca gctgcagact ggctcccccc 540 ctgccaccac ccagccacag ggcaccacacagccccccgc ccagccagca tcgcagggct 600 caggaccaac cgcatagctg cctatgcccccgcagaactg gctgctgcgt gtgaactgaa 660 cagacggaga agatgtgcta gggagaatctgcctccacag tcacccattt cattgctcgc 720 tgcgaaagag acgtgagact gacatatgccattatctctt ttccagtatt aaacactcat 780 atgcttatgg cttggagaaa tttcttagttgggtgaatta aaggttaatc cgagaattag 840 catggatata ccgggacctc atgcagcttggcagatatct gagaaatggt ttaattcatg 900 ctcaggagct gtgtgccttt ccatcccttccggctcccta cccctcactt ccaagggttc 960 tctctcctgc ttgcgcttag tgtcctacatggggttgtga agcgatggag ctcctcactg 1020 gactcgcctc tctcctctcc tccccccaggaggaacttga aaggagggta aaaagactaa 1080 aatgaggggg aacagagttc actgtacaaatttgacaact gtcaccaaaa ttcataaaaa 1140 acaatagtac tgtgcctctt tcttctcaaacaatggatga cacaaaacta tgagagtgac 1200 aaaatggtga caggtagctg ggacctaggctatcttacca tgaaggttgt tttgcttatt 1260 gtatatttgt gtatgtagtg taactattttgtacaataga ggactgtaac tactatttag 1320 gttgtacaga ttgaaattta gttgtttcattggctgtctg aggaggtgtg gacttttata 1380 tatagatcta cataaaaact gctacatgacaaaaaccaca cctaaagaaa ttttaagaat 1440 ttggcacagt tactcacttt gtgtaatctgaaatctagct gctgaatacg ctgaagtaaa 1500 tccttgttca ctgaagtctt tcaattgagctggttgaata ctttgaaaaa tgctcagttc 1560 taactaatga aatggatttc ccagtaggggtttctgcata tcacctgtat agtagttata 1620 tgcatatgtt tctgtgcatg ttctctacacaattgtaagg tgtcactgta tttaactgtt 1680 gcacttgtca actttcaata aagcatataaatgttgat 1718 5 392 DNA Homo sapiens unsure (342) unsure (362) 5gctcctacca cccagacacc caaacagccg tggccccaga ggtcctggcc aaatatgggg 60gcctgcctag gttggtggaa cagtgctcct tatgtaaact gagccctttg tttagaaaac 120aattccaaat gtgaaactag aatgagaggg aagagatagc atggcatgca gcacacacgg 180ctgctccagt tcatggcctc ccaggggtgc tggggatgca tccaaagtgg ttgtctgaga 240cagagttggg aaaccctcac caactgggcc tctttcacct tccacattat cccgctgcca 300ccggttgccc tgttttcatt gcaggtttca gggaccagct tngggttgcg tgcgtttttg 360cntttgccag ttcaggccga gggtgttagt tt 392 6 429 DNA Homo sapiens 6ttttttttta aggacacgag agagccatat ttatttcaca tggacaagca tgattccatt 60gcatgctgaa catgaaagct cgtatgagca aagtacccgt aacagcagaa ttatgtgctt 120ttgtccacag ggagcaggga gaatcacaaa gttgttttca gagacagtgt ttttcaagca 180cagttgagac cataggctct ggaagtcact ggtttatttc atcaccaaag ggtctgtctc 240ccagggagtg gccggagtgc tttcagcttt gcaatctctc aatgaattga taaggtctga 300ggagggctga ggatggtctc ccatcccacc acccagagca tctttgaagg aaatgaagct 360cagaggggaa ggttacatgc cattgggaat ttaacaaggg ccattcctgg gttggacaat 420gacagggga 429 7 1305 DNA Homo sapiens 7 cgcggctcag taattgaagg cctgaaacgcccatgtgcca ctgactagga ggcttccctg 60 ctgcggcact tcatgaccca gcggcgcgcggcccagtgaa gccaccgtgg tgtccagcat 120 ggccgcgctg ctcctgggcg cggtgctgctggtggcccag ccccagctag tgccttcccg 180 ccccgccgag ctaggccagc aggagcttctgcggaaagcg gggaccctcc aggatgacgt 240 ccgcgatggc gtggccccaa acggctctgcccagcagttg ccgcagacca tcatcatcgg 300 cgtgcgcaag ggcggcacgc gcgcactgctggagatgctc agcctgcacc ccgacgtggc 360 ggccgcggag aacgaggtcc acttcttcgactgggaggag cattacagcc acggcttggg 420 ctggtacctc agccagatgc ccttctcctggccacaccag ctcacagtgg agaagacccc 480 cgcgtatttc acgtcgccca aagtgcctgagcgagtctac agcatgaacc cgtccatccg 540 gctgctgctc atcctgcgag acccgtcggagcgcgtgcta tctgactaca cccaagtgtt 600 ctacaaccac atgcagaagc acaagccctacccgtccatc gaggagttcc tggtgcgcga 660 tggcaggctc aatgtggact acaaggccctcaaccgcagc ctctaccacg tgcacatgca 720 gaactggctg cgctttttcc cgctgcgccacatccacatt gtggacggcg accgcctcat 780 cagggacccc ttccctgaga tccaaaaggtcgagaggttc ctaaagctgt cgccgcagat 840 caatgcttcg aacttctact ttaacaaaaccaagggcttt tactgcctgc gggacagcgg 900 ccgggaccgc tgcttacatg agtccaaaggccgggcgcac ccccaagtcg atcccaaact 960 actcaataaa ctgcacgaat attttcatgagccaaataag aagttcttcg agcttgttgg 1020 cagaacattt gactggcact gatttgcaataagctaagct cagaaacttt cctactgtaa 1080 gttctggtgt acatctgagg ggaaaaagaattttaaaaaa gcatttaagg tataatttat 1140 ttgtaaaatc cataaagtac ttctgtacagtattagattc acaattgcca tatatactag 1200 ttatattttt ctacttgtta aatggagggcattttgtatt gtttttcatg gttgttaaca 1260 ttgtgtaata tgtctctata tgaaggaactaaactatttc actga 1305 8 331 DNA Homo sapiens unsure (80) unsure (104)unsure (115) unsure (135) unsure (186) unsure (308) unsure (271) 8gctcaggaca gatgccacac aaggatagat gctggcccag ggccaagagc ccagctccaa 60ggggaatcag aactcaaatn gggccagatc cagcctgggg tctngagttg atctngaacc 120cagactcaga cattngcacc taatccaggc agatccagga ctatatttgg gcctgctcca 180gacctngatc ctggaggccc agttcaccct gatttaggag aagccaggaa tttcccagga 240ccctgaaggg gccatgatgg caacagatct ngaacctcag cctggccaga cacaggccct 300ccctgttncc cagagaaagg ggagcccact g 331 9 346 DNA Homo sapiens unsure(40) unsure (286) unsure (320) 9 tttattgcac ttgcaacaga gtttaaataagtcctgggtn tctggtgcca aggtgaggga 60 agggttgggc agagagatga ggggcagcatcagtgcagct ggcaggcaga acccaaattc 120 tgcaggccca ggacagtggg ctcccctttctctggggaac agggagggcc tgtgtctggc 180 caggctgagg ttccagatct gttgccatcatggccccttc agggtcctgg ggaaattcct 240 gggcttctcc taaatcaggg tgaactgggcctccagggat caggtntggg agcaggccca 300 aatataagtc ctgggatctn cctgggattagggtgccaat gtctga 346 10 4132 DNA Homo sapiens 10 cgctggggcc cccggcgccgacccccgctg ctgccgctgc tgttgctgct gctgccgccg 60 ccacccaggg tcgggggcttcaacttagac gcggaggccc cagcagtact ctcggggccc 120 ccgggctcct tcttcggattctcagtggag ttttaccggc cgggaacaga cggggtcagt 180 gtgctggtgg gagcacccaaggctaatacc agccagccag gagtgctgca gggtggtgct 240 gtctacctct gtccttggggtgccagcccc acacagtgca cccccattga atttgacagc 300 aaaggctctc ggctcctggagtcctcactg tccagctcag agggagagga gcctgtggag 360 tacaagtcct tgcagtggttcggggcaaca gttcgagccc atggctcctc catcttggca 420 tgcgctccac tgtacagctggcgcacagag aaggagccac tgagcgaccc cgtgggcacc 480 tgctacctct ccacagataacttcacccga attctggagt atgcaccctg ccgctcagat 540 ttcagctggg cagcaggacagggttactgc caaggaggct tcagtgccga gttcaccaag 600 actggccgtg tggttttaggtggaccagga agctatttct ggcaaggcca gatcctgtct 660 gccactcagg agcagattgcagaatcttat taccccgagt acctgatcaa cctggttcag 720 gggcagctgc agactcgccaggccagttcc atctatgatg acagctacct aggatactct 780 gtggctgttg gtgaattcagtggtgatgac acagaagact ttgttgctgg tgtgcccaaa 840 gggaacctca cttacggctatgtcaccatc cttaatggct cagacattcg atccctctac 900 aacttctcag gggaacagatggcctcctac tttggctatg cagtggccgc cacagacgtc 960 aatggggacg ggctggatgacttgctggtg ggggcacccc tgctcatgga tcggacccct 1020 gacgggcggc ctcaggaggtgggcagggtc tacgtctacc tgcagcaccc agccggcata 1080 gagcccacgc ccacccttaccctcactggc catgatgagt ttggccgatt tggcagctcc 1140 ttgacccccc tgggggacctggaccaggat ggctacaatg atgtggccat cggggctccc 1200 tttggtgggg agacccagcagggagtagtg tttgtatttc ctgggggccc aggagggctg 1260 ggctctaagc cttcccaggttctgcagccc ctgtgggcag ccagccacac cccagacttc 1320 tttggctctg cccttcgaggaggccgagac ctggatggca atggatatcc tgatctgatt 1380 gtggggtcct ttggtgtggacaaggctgtg gtatacaggg gccgccccat cgtgtccgct 1440 agtgcctccc tcaccatcttccccgccatg ttcaacccag aggagcggag ctgcagctta 1500 gaggggaacc ctgtggcctgcatcaacctt agcttctgcc tcaatgcttc tggaaaacac 1560 gttgctgact ccattggtttcacagtggaa cttcagctgg actggcagaa gcagaaggga 1620 ggggtacggc gggcactgttcctggcctcc acgcaggcaa ccctgaccca gaccctgctc 1680 atccagaatg gggctcgagaggattgcaga gagatgaaga tctacctcag gaacgagtca 1740 gaatttcgag acaaactctcgccgattcac atcgctctca acttctcctt ggacccccaa 1800 gccccagtgg acagccacggcctcaggcca gccctacatt atcagagcaa gagccggata 1860 gaggacaagg ctcagatcttgctggactgt ggagaagaca acatctgtgt gcctgacctg 1920 cagctggaag tgtttggggagcagaaccat gtgtacctgg gtgacaagaa tgccctgaac 1980 ctcactttcc atgcccagaatgtgggtgag ggtggcgcct atgaggctga gcttcgggtc 2040 accgcccctc cagaggctgagtactcagga ctcgtcagac acccagggaa cttctccagc 2100 ctgagctgtg actactttgccgtgaaccag agccgcctgc tggtgtgtga cctgggcaac 2160 cccatgaagg caggagccagtctgtggggt ggccttcggt ttacagtccc tcatctccgg 2220 gacactaaga aaaccatccagtttgacttc cagatcctca gcaagaatct caacaactcg 2280 caaagcgacg tggtttcctttcggctctcc gtggaggctc aggcccaggt caccctgaac 2340 ggtgtctcca agcctgaggcagtgctattc ccagtaagcg actggcatcc ccgagaccag 2400 cctcagaagg aggaggacctgggacctgct gtccaccatg tctatgagct catcaaccaa 2460 ggccccagct ccattagccagggtgtgctg gaactcagct gtccccaggc tctggaaggt 2520 cagcagctcc tatatgtgaccagagttacg ggactcaact gcaccaccaa tcaccccatt 2580 aacccaaagg gcctggagttggatcccgag ggttccctgc accaccagca aaaacgggaa 2640 gctccaagcc gcagctctgcttcctcggga cctcagatcc tgaaatgccc ggaggctgag 2700 tgtttcaggc tgcgctgtgagctcgggccc ctgcaccaac aagagagcca aagtctgcag 2760 ttgcatttcc gagtctgggccaagactttc ttgcagcggg agcaccagcc atttagcctg 2820 cagtgtgagg ctgtgtacaaagccctgaag atgccctacc gaatcctgcc tcggcagctg 2880 ccccaaaaag agcgtcaggtggccacagct gtgcaatgga ccaaggcaga aggcagctat 2940 ggcgtcccac tgtggatcatcatcctagcc atcctgtttg gcctcctgct cctaggtcta 3000 ctcatctaca tcctctacaagcttggattc ttcaaacgct ccctcccata tggcaccgcc 3060 atggaaaaag ctcagctcaagcctccagcc acctctgatg cctgagtcct cccaatttca 3120 gactcccatt cctgaagaaccagtcccccc accctcattc tactgaaaag gaggggtctg 3180 ggtacttctt gaaggtgctgacggccaggg agaagctcct ctccccagcc cagagacata 3240 cttgaagggc cagagccaggggggtgagga gctggggatc cctccccccc atgcactgtg 3300 aaggaccctt gtttacacataccctcttca tggatggggg aactcagatc cagggacaga 3360 ggcccagcct ccctgaagcctttgcatttt ggagagtttc ctgaaacaac ttggaaagat 3420 aactaggaaa tccattcacagttctttggg ccagacatgc cacaaggact tcctgtccag 3480 ctccaacctg caaagatctgtcctcagcct tgccagagat ccaaaagaag cccccagcta 3540 agaacctgga acttggggagttaagacctg gcagctctgg acagccccac cctggtgggc 3600 caacaaagaa cactaactatgcatggtgcc ccaggaccag ctcaggacag atgccacaca 3660 aggatagatg ctggcccagggccagagccc agctccaagg ggaatcagaa ctcaaatggg 3720 gccagatcca gcctggggtctggagttgat ctggaaccca gactcagaca ttggcaccta 3780 atccaggcag atccaggactatatttgggc ctgctccaga cctgatcctg gaggcccagt 3840 tcaccctgat ttaggagaagccaggaattt cccaggacct gaaggggcca tgatggcaac 3900 agatctggaa cctcagcctggccagacaca ggccctccct gttccccaga gaaaggggag 3960 cccactgtcc tgggcctgcagaatttccct tctgcctgcc agctgcactg atgctgcccc 4020 tcatctctct gcccaacccttccctcacct tggcaccaga cacccaggac ttatttaaac 4080 tctgttgcaa gtgcaataaatctgacccag tgcccccact gaccagaact ag 4132 11 486 DNA Homo sapiens unsure(376) unsure (395) unsure (467) 11 agcctgatct ctgtccaccg gtcctttataccctcatgac ccgctgctgg gactacgacc 60 ccagtgaccg gccccgcttc accgagctggtgtgcagcct cagtgacgtt tatcagatgg 120 agaaggacat tgccatggag caagagaggaatgctcgcta ccgaaccccc aaaatcttgg 180 agcccacagc cttccaggaa cccccacccaagcccagccg acctaagtac agaccccctc 240 cgcaaaccaa cctcctgggc tccaaagctgcagttccagg ttcctgaggg tctgtgtgcc 300 agctctcctg acggcttcac cagccctatgggagtattcc attcttcccg ttaaattcac 360 tggcacaccc cacctnttcc accgggcacaatgtntttca aaacggccac aggatggggg 420 ggagggaggg attttcattc caacccaggcaggccgagga agagggncca gcagttgttg 480 gggagg 486 12 393 DNA Homo sapiensunsure (349) unsure (360) unsure (377) unsure (384) 12 ttttttttttttttgcaaat gggacaattt taattcaacc acaagtcaaa tagaaagaag 60 ttaaaagaatgtttatgcaa acacatgaga aaagaagggt gcagatgaga atgggggttg 120 gggagagaaagaggaggagt aagaaaagag ggaaaagcaa gggaaagtaa aggaagaaag 180 agaaagaggggcaggaagag agcggatttg gcccaaggtc ctatcttggc cgcatctctc 240 tgcttcttccccctgatgct tggtttgttg acaacacagc atcctgtgcc tgggactccc 300 aattagcttgttcctgggac tgtgccccag ggtcctccct caggagggnc acatgctgtn 360 cagtccagaccaaactncac attnaaataa ttt 393 13 4089 DNA Homo sapiens 13 gaattccgtcagccctttta ctcagccaca gcctccggag ccgttgcaca cctacctgcc 60 cggccgacttacctgtactt gccgccgtcc cggctcacct ggcggtgccc gaggagtagt 120 cgctggagtccgcgcctccc tgggactgca atgtgccgat cttagctgct gcctgagagg 180 atgtctggggtgtccgagcc cctgagtcga gtaaagttgg gcacgttacg ccggcctgaa 240 ggccctgcagagcccatggt ggtggtacca gtagatgtgg aaaaggagga cgtgcgtatc 300 ctcaaggtctgcttctatag caacagcttc aatcctggga aaaacttcaa actggtcaaa 360 tgcactgtccagacggagat ccgggagatc atcacctcca tcctgctgag cgggcggatc 420 gggcccaacatccggttggc tgagtgctat gggctgaggc tgaagcacat gaagtccgat 480 gagatccactggctgcaccc acagatgacg gtgggtgagg tgcaggacaa gtatgagtgt 540 ctgcacgtggaagccgagtg gaggtatgac cttcaaatcc gctacttgcc agaagacttc 600 atggagagcctgaaggagga caggaccacg ctgctctatt tttaccaaca gctccggaac 660 gactacatgcagcgctacgc cagcaaggtc agcgagggca tggccctgca gctgggctgc 720 ctggagctcaggcggttctt caaggatatg ccccacaatg cacttgacaa gaagtccaac 780 ttcgagctcctagaaaagga agtggggctg gacttgtttt tcccaaagca gatgcaggag 840 aacttaaagcccaaacagtt ccggaagatg atccagcaga ccttccagca gtacgcctcg 900 ctcagggaggaggagtgcgt catgaagttc ttcaacactc tcgccccgtt cgccaacatc 960 gaccaggagacctaccgctg tgaactcatt caaggatgga acattactgt ggacctggtc 1020 attggccctaaagggatccg ccagctgact agtcaggacg caaagcccac ctgcctggcc 1080 gagttcaagcagatcaggtc catcaggtgc ctcccgctgg aggagggcca ggcagtactt 1140 cagctgggcattgaaggtgc cccccaggcc ttgtccatca aaacctcatc cctagcagag 1200 gctgagaacatggctgacct catagacggc tactgccggc tgcagggtga gcaccaaggc 1260 tctctcatcatccatcctag gaaagatggt gagaagcgga acagcctgcc ccagatcccc 1320 atgctaaacctggaggcccg gcggtcccac ctctcagaga gctgcagcat agagtcagac 1380 atctacgcagagattcccga cgaaaccctg cgaaggcccg gaggtccaca gtatggcatt 1440 gcccgtgaagatgtggtcct gaatcgtatt cttggggaag gcttttttgg ggaggtctat 1500 gaaggtgtctacacaaatca taaaggggag aaaatcaatg tagctgtcaa gacctgcaag 1560 aaagactgcactctggacaa caaggagaag ttcatgagcg aggcagtgat catgaagaac 1620 ctcgaccacccgcacatcgt gaagctgatc ggcatcattg aagaggagcc cacctggatc 1680 atcatggaattgtatcccta tggggagctg ggccactacc tggagcggaa caagaactcc 1740 ctgaaggtgctcaccctcgt gctgtactca ctgcagatat gcaaagccat ggcctacctg 1800 gagagcatcaactgcgtgca cagggacatt gctgtccgga acatcctggt ggcctcccct 1860 gagtgtgtgaagctggggga ctttggtctt tcccggtaca ttgaggacga ggactattac 1920 aaagcctctgtgactcgtct ccccatcaaa tggatgtccc cagagtccat taacttccga 1980 cgcttcacgacagccagtga cgtctggatg ttcgccgtgt gcatgtggga gatcctgagc 2040 tttgggaagcagcccttctt ctggctggag aacaaggatg tcatcggggt gctggagaaa 2100 ggagaccggctgcccaagcc tgatctctgt ccaccggtcc tttataccct catgacccgc 2160 tgctgggactacgaccccag tgaccggccc cgcttcaccg agctggtgtg cagcctcagt 2220 gacgtttatcagatggagaa ggacattgcc atggagcaag agaggaatgc tcgctaccga 2280 acccccaaaatcttggagcc cacagccttc caggaacccc cacccaagcc cagccgacct 2340 aagtacagaccccctccgca aaccaacctc ctggctccaa agctgcagtt ccaggttcct 2400 gagggtctgtgtgccagctc tcctacgctc accagcccta tggagtatcc atctcccgtt 2460 aactcactgcacaccccacc tctccaccgg cacaatgtct tcaaacgcca cagcatgggg 2520 gaggaggacttcatccaacc cagcagccga gaagaggccc agcagctgtg ggaggctgaa 2580 aaggtcaaaatgcggcaaat cctggacaaa cagcagaagc agatggtgga ggactaccag 2640 tggctcaggcaggaggagaa gtccctggac cccatggttt atatgaatga taagtcccca 2700 ttgacgccagagaaggaggt cggctacctg gagttcacag ggcccccaca gaagcccccg 2760 aggctgggcgcacagtccat ccagcccaca gctaacctgg accggaccga tgacctggtg 2820 tacctcaatgtcatggagct ggtgcgggcc gtgctggagc tcaagaatga gctctgtcag 2880 ctgccccccgagggctacgt ggtggtggtg aagaatgtgg ggctgaccct gcggaagctc 2940 atcgggagcgtggatgatct cctgccttcc ttgccgtcat cttcacggac agagatcgag 3000 ggcacccagaaactgctcaa caaagacctg gcagagctca tcaacaagat gcggctggcg 3060 cagcagaacgccgtgacctc cctgagtgag gagtgcaaga ggcagatgct gacggcttca 3120 cacaccctggctgtggacgc caagaacctg ctcgacgctg tggaccaggc caaggttctg 3180 gccaatctggcccacccacc tgcagagtga cggagggtgg gggccacctg cctgcgtctt 3240 ccgcccctgcctgccatgta cctcccctgc cttgctgttg gtcatgtggg tcttccaggg 3300 agaaggccaaggggagtcac cttcccttgc cactttgcac gacgccctct ccccacccct 3360 acccctggctgtactgctca ggctgcagct ggacagaggg gactctgggc tatggacaca 3420 gggtgacggtgacaaagatg gctcagaggg ggactgctgc tgcctggcca ctgctcccta 3480 agccagcctggtccatgcag ggggctcctg ggggtgggga ggtgtcacat ggtgccccta 3540 gctttatatatggacatggc aggccgattt gggaaccaag ctattccttt cccttcctct 3600 tctcccctcagatgtccctt gatgcacaga gaagctgggg aggagctttg ttttcggggg 3660 tcaggcagccagtgagatga gggatgggcc tggcattctt gtacagtgta tattgaaatt 3720 tatttaatgtgaggtttggt ctggactgac agcatgtgcc ctcctgaggg aggaccaggg 3780 cacagtccaggaacaagcta attgggagtc caggcacagg atgctgtgtt gtcaacaaac 3840 caagcatcagggggaagaag cagagagatg cggccaagat aggaccttgg gccaaatccg 3900 ctctcttcctgcccctcttt ctctttcttc ctttactttc ccttgctttt ccctcttttc 3960 ttactcctcctctttctctc ccccaccccc attctcatct gcacccttct tttctcatgt 4020 gtttgcataaacattctttt aacttctttc tatttgactt gtggttgaat taaaattgtc 4080 ccatttgca4089 14 464 DNA Homo sapiens unsure (146) unsure (448) 14 gacctggagatcaacgggga gaaggtgaag ctgcagatct gggacacagc ggggcaggag 60 cgcttccgcaccatcacctc cacgtattat cgggggaccc acggggtcat ttgtggttta 120 cgacgtcaccagtgccgagt cctttntcaa cgtcaagcgg tggcttcacg aaatcaacca 180 gaactgtgatgatgtgtgcc gaatattagt gggtaataag aatgacgacc ctgagcggaa 240 ggtggtggagacggaagatg cctacaaatt cgccgggcag atgggcatcc agttgttcga 300 gaccagcgccaaggagaatg tcaacgtggg aagagatgtt tcaactgcat tcacggagct 360 ggtcctccgagcaaagaaag acaaccttgg gcaaaacagc agcagcaaca acagaacgat 420 gttggttgaagtttacgaag gaacattnaa cgaaagaaac gttt 464 15 373 DNA Homo sapiens 15tttttttttt tttttttttt taattgtgag gaatttaatt cacttgattt ggcttcattt 60tcttgatctg ttaaaataat cctcccatag cccccctgcc agccccatct ctgcacgaac 120ctaccccgac ctttctgttg gaactgaaac ctgttggtgt aaatgagaag ccatggctgc 180cctgggtttg gagctcagag gcatctagaa ggcaggacaa gaaatctgtt ggccaaaggg 240caagacctgc cacctctgtg gaactgcagg gcctgccttg agaccaggtt ccccagctcc 300cagaatggct gtggggacag gacaacgggg agggaaggga gctggcacag gccccggaga 360aggggcaaga ccc 373 16 730 DNA Homo sapiens 16 gctgccggag cagcccgaagagctgcggat cgcgaggcca gtaccgaccc cgcccgcccg 60 cgcgctccgc ccccgcccgccatggcccgg gactacgacc acctcttcaa gctgctcatc 120 atcggcgaca gcggtgtgggcaagagcagt ttactgttgc gttttgcaga caacactttc 180 tcaggcagct acatcaccacgatcggagtg gatttcaaga tccggaccgt ggagatcaac 240 ggggagaagg tgaagctgcagatctgggac acagcggggc aggagcgctt ccgcaccatc 300 acctccacgt attatcgggggacccacggg gtcattgtgg tttacgacgt caccagtgcc 360 gagtcctttg tcaacgtcaagcggtggctt cacgaaatca accagaactg tgatgatgtg 420 tgccgaatat tagtgggtaataagaatgac gaccctgagc ggaaggtggt ggagacggaa 480 gatgcctaca aattcgccgggcagatgggc atccagttgt tcgagaccag cgccaaggag 540 aatgtcaacg tggaagagatgttcaactgc atcacggagc tggtcctccg agcaaagaaa 600 gacaacctgg caaaacagcagcagcaacaa cagaacgatg tggtgaagct cacgaagaac 660 agtaaacgaa agaaacgctgctgctaatgg cacccagtcc actgcagaga ctgcactgcg 720 gtccctcccc 730 17 334DNA Homo sapiens unsure (61) unsure (223) unsure (230) unsure (304) 17acagagtagc agctcagatg ccagagatcg aaagaaggct cgaatgagtg agctggaaca 60naagtggtag atttagaaga agagaaccaa aaacttttgc tagaaaatca gcttttacga 120gagaaaactc atggccttgt agttgagaac caggagttaa gacagcgctt ggggatggat 180gccctggttg ctgaagagga ggcggagcaa ggggaatgaa gtnaggccan tgcgggtctg 240ctgagtccgc agcactcaga ctacgtgcac ctctgcagca ggtgcaggcc cagttgtcac 300cctncagaac atctccccat ggattctggc ggta 334 18 412 DNA Homo sapiens unsure(120) unsure (153) unsure (210) unsure (372) unsure (381) unsure (411)18 tttttttttg ctgcattgta ccttttaatt gcatgggtag ttttaaataa atggagaaag 60cacctttcag aagctacact agcaggaaaa aattccatca agcatttaca tagtaaattn 120ctataatttc acaaaagatt cttgatctta ctngaagtat acatgaggga aagagccccc 180tcagcaggtg ttcccgttgc ttacagaagn aaactaaagg acctaaaact ggaggcaagc 240cagggtgcca aaaaggggga agagaaatga taaagaacca ttcataaatt ccatgtctac 300ttcaaggaca tttgtctaat gacccttaca taataagtat tttaggggaa aactaccacc 360ctttttaagg tnaaagtaca nttcttaaaa ggctggtagg tttctcaatt nt 412 19 1818DNA Homo sapiens 19 tagtctggag ctatggtggt ggtggcagcc gcgccgaacccggccgacgg gacccctaaa 60 gttctgcttc tgtcggggca gcccgcctcc gccgccggagccccggcggc caggctgccg 120 ctcatggtgc cagcccagag aggggccagc ccggaggcagcgagcggggg gctgccccag 180 gcgcgcaagc gacagcgcct cacgcacctg agccccgaggagaaggcgct gaggaggaaa 240 ctgaaaaaca gagtagcagc tcagactgcc agagatcgaaagaaggctcg aatgagtgag 300 ctggaacagc aagtggtaga tttagaagaa gagaaccaaaaacttttgct agaaaatcag 360 cttttacgag agaaaactca tggccttgta gttgagaaccaggagttaag acagcgcttg 420 gggatggatg ccctggttgc tgaagaggag gcggaagccaaggggaatga agtgaggcca 480 gtggccgggt ctgctgagtc cgcagcactc agactacgtgcacctctgca gcaggtgcag 540 gcccagttgt cacccctcca gaacatctcc ccatggattctggcggtatt gactcttcag 600 attcagagtc tgatatcctg ttgggcattc tggacaacttggacccagtc atgttcttca 660 aatgcccttc cccagagcct gccagcctgg aggagctcccagaggtctac ccagaaggac 720 ccagttcctt accagcctcc ctttctctgt cagtggggacgtcatcagcc aagctggaag 780 ccattaatga actaattcgt tttgaccaca tatataccaagcccctagtc ttagagatac 840 cctctgagac agagagccaa gctaatgtgg tagtgaaaatcgaggaagca cctctcagcc 900 cctcagagaa tgatcaccct gaattcattg tctcagtgaaggaagaacct gtagaagatg 960 acctcgttcc ggagctgggt atctcaaatc tgctttcatccagccactgc ccaaagccat 1020 cttcctgcct actggatgct acagtgactg tggatacgggggttcccttt ccccattcag 1080 tgacatgtcc tctctgcttg gtgtaaacat tcttgggaggacacttttgc caatgaactc 1140 tttccccagc tgattagtgt ctaaggaatg atccaatactgttgcccttt tccttgacta 1200 ttacactgcc tggaggatag cagagaagcc tgtctgtacttcattcaaaa agccaaaata 1260 gagagtatac agtcctagag aatccctcta tttgttcagatctcatagat gacccccagg 1320 tattgccttt tgacatccag cagtccaagg tattgagacatattactgga agtaagaaat 1380 attactataa ttgagaacta cagcttttaa gattgtacttttaagattgt acttttatct 1440 taaaagggtg gtagttttcc ctaaaatact tattatgtaagggtcattag acaaatgtct 1500 tgaagtagac atggaattta tgaatggtct ttatcatttctcttccccct ttttggcatc 1560 ctggcttgcc tccagtttta ggtcctttag tttgcttctgcaagcaacgg gaacacctgc 1620 tgagggggct ctttccctca tgtatacttc aagtaagatcaagaatcttt tgtgaaatta 1680 tagaaattta ctatgtaaat gcttgatgga attttttcctgctagtgtag cttctgaaag 1740 gtgctttctc catttattta aaaactaccc atgcaattaaaaggtacaat gcaaaaaaaa 1800 aaaaaaaaaa attttttt 1818 20 350 DNA Homosapiens unsure (68) unsure (86) unsure (188) unsure (253) 20 aaacagtaattctttagact ttattaaaaa atgacataaa gtgcatctta ttaaaaaatg 60 tataaaanccacataaattc cagggncccc tgtgcctggg cagtgttgat atcccttaga 120 gtggaggaaggtgagggatg gagggtgaac tggggactgg ggagaggacc agggtgcagt 180 tagttccncgtgtttgagtt caaagatgga gcgagggtgg atatggtggg aaggggcaca 240 cgggttctcacgncaacaac ggaggaaggc aggcgacagt ctcttccctg aattctgagg 300 gaaaggcgtacattgtcacg aaatctctcc tgagctcgcg ctgtcctctc 350 21 394 DNA Homo sapiensunsure (208) unsure (345) unsure (361) unsure (373) unsure (378) 21gaaggaactg gtctgctcac acttgctggc ttgcgcatca ggactggctt tatctcctga 60ctcacggtgc aaaggtgcac tctgcgaacg ttaagtccgt ccccagcgct tggaatccta 120cggcccccac agccggatcc cctcagcctt ccaggtcctc aactcccgtg gacgctgaac 180aatggcctcc atggggctac aggtaatngg catcgcgctg gccgtcctgg gctggctggc 240cgtcatgctg tgctgcgcgc tgcccatgtg gcgcgtgacg gcctttcatc ggcagcaaca 300ttgtcaactt gcagaccatc tgggaagggc ctattggatg aactncgtgg ttcaaaagcc 360ngtccaagat tgnatttnaa aggttttaac gatt 394 22 1665 DNA Homo sapiens 22gaaggaactg gttctgctca cacttgctgg cttgcgcatc aggactggct ttatctcctg 60actcacggtg caaaggtgca ctctgcgaac gttaagtccg tccccagcgc ttggaatcct 120acggccccca cagccggatc ccctcagcct tccaggtcct caactcccgt ggacgctgaa 180caatggcctc catggggcta caggtaatgg gcatcgcgct ggccgtcctg ggctggctgg 240ccgtcatgct gtgctgcgcg ctgcccatgt ggcgcgtgac ggccttcatc ggcagcaaca 300ttgtcacctc gcagaccatc tgggagggcc tatggatgaa ctgcgtggtg cagagcaccg 360gccagatgca gtgcaaggtg tacgactcgc tgctggcact gccgcaggac ctgcaggcgg 420cccgcgccct cgtcatcatc agcatcatcg tggctgctct gggcgtgctg ctgtccgtgg 480tggggggcaa gtgtaccaac tgcctggagg atgaaagcgc caaggccaag accatgatcg 540tggcgggcgt ggtgttcctg ttggccggcc ttatggtgat agtgccggtg tcctggacgg 600cccacaacat catccaagac ttctacaatc cgctggtggc ctccgggcag aagcgggaga 660tgggtgcctc gctctacgtc ggctgggccg cctccggcct gctgctcctt ggcggggggc 720tgctttgctg caactgtcca ccccgcacag acaagcctta ctccgccaag tattctgctg 780cccgctctgc tgctgccagc aactacgtgt aaggtgccac ggctccactc tgttcctctc 840tgctttgttc ttccctggac tgagctcagc gcaggctgtg accccaggag ggccctgcca 900cgggccactg gctgctgggg actggggact gggcagagac tgagccaggc aggaaggcag 960cagccttcag cctctctggc ccactcggac aacttcccaa ggccgcctcc tgctagcaag 1020aacagagtcc accctcctct ggatattggg gagggacgga agtgacaggg tgtggtggtg 1080gagtggggag ctggcttctg ctggccagga tagcttaacc ctgactttgg gatctgcctg 1140catcggcgtt ggccactgtc cccatttaca ttttccccac tctgtctgcc tgcatctcct 1200ctgttccggg taggccttga tatcacctct gggactgtgc cttgctcacc gaaacccgcg 1260cccaggagta tggctgaggc cttgcccacc cacctgcctg ggaagtgcag agtggatgga 1320cgggtttaga ggggaggggc gaaggtgctg taaacaggtt tgggcagtgg tgggggaggg 1380ggccagagag gcggctcagg ttgcccagct ctgtggcctc aggactctct gcctcacccg 1440cttcagccca gggcccctgg agactgatcc cctctgagtc ctctgcccct tccaaggaca 1500ctaatgagcc tgggagggtg gcagggagga ggggacagct tcacccttgg aagtcctggg 1560gtttttcctc ttccttcttt gtggtttctg ttttgtaatt taagaagagc tattcatcac 1620tgtaattatt attattttct acaataaatg ggacctgtgc acagg 1665 23 345 DNA Homosapiens unsure (291) 23 aggtcctact ggaaggagtt cctggtgatg tgcacgctctttgtgctggc cgtgctgctc 60 ccagttttat tcttgctcta ccggcaccgg aacagcatgaaagtcttcct gaagcagggg 120 gaatgtgcca gcgtgcaccc caagacctgc cctgtggtgctgccccctga gacccgccca 180 ctcaacggcc tagggcccct agcaccccgc tcgatcaccgagggtaccag tccctgtcag 240 acagcccccc ggggttcccg agtcttcact gagtcagagaagaggccact nagcatccaa 300 gacagcttcg tgggaggtat ccccagtgtg cccccggccccgggg 345 24 2433 DNA Homo sapiens 24 gaagaaaggc tgattagaaa atttgaagctgaaaacatct ccaactacac ggcccttctg 60 ctgagccagg atggaaagac gctgtatgtgggggcccgag aggccctctt tgcacttaac 120 agcaacctca gcttcttgcc aggcggggagtaccaagagc tactgtggag tgcagatgct 180 gacaggaagc agcagtgcag cttcaagggcaaggacccaa agcgtgactg tcaaaactac 240 atcaagatcc tcctgccact caacagcagccacctgctca cctgtggcac ggccgccttc 300 agccccctgt gtgcttacat tcacatagcgagctttactt tagcccaaga tgaggccggt 360 aatgtcattc tggaggatgg caagggtcattgtccctttg accccaactt caagtccacg 420 gctctggtgg ttgatggtga gctgtacactggaacagtca gtagcttcca gggaaacgac 480 ccagccattt cccggagcca gagttcccgccccaccaaga ctgagagctc cctcaactgg 540 ctacaagacc ctgcctttgt ggcctcggctacgtcccccg agagcctggg cagccccata 600 ggtgatgatg ataagatcta cttcttcttcagcgagacgg gccaggagtt tgagttcttt 660 gagaacacca tcgtgtcccg agttgcccgagtctgtaagg gcgatgaggg tggagagcgg 720 gtgttgcagc aacgctggac ctcctttctcaaggctcagc tcctgtgctc ccggcctgat 780 gatggctttc cctttaacgt gctacaagatgtcttcaccc tgaaccccaa ccctcaggat 840 tggcgcaaga ccctttctat cggggtctttacctcccagt ggcacagagg gaccacagaa 900 ggctctgcca tctgcgtctt caccatgaatgatgtgcaga aggcctttga cggcctgtac 960 aagaaagtaa acagagagac acagcagtggtataccgaga cccaccaggt gcccacaccg 1020 cggccgggag cgtgcattac caacagtgcccgggaacgga agatcaactc gtccctgcag 1080 ctcccagacc gagtgctgaa cttcctcaaggatcacttct tgatggatgg gcaggtccgc 1140 agtcgcctgc tgctgctgca gcccagagcccgctaccagc gtgtggctgt gcaccgtgtg 1200 cctggcctgc acagcactta tgatgtcctatttctgggca ctggtgatgg ccgcctgcac 1260 aaagcagtga ccctgagctc cagagtccacatcattgagg agctgcagat cttccctcaa 1320 ggacagcctg tgcagaacct gctcttggacagccatgggg gactgttgta tgcctcctcc 1380 cattccgggg tggtgcaagt gcccgtagccaactgcagcc tgtacccaac ctgtggagac 1440 tgcctcctgg ctcgagaccc ctactgcgcctggactggct ctgcctgcag gctcgctagc 1500 ctctaccagc ctgatctggc ctccaggccatggacccagg acattgaggg tgccagtgtc 1560 aaggaactct gcaagaattc ctcatacaaggcccggtttc ttgtgccagg taagccatgt 1620 aaacaagtcc agatccaacc aaacacagtgaacaccctgg cctgcccact cctctcaaac 1680 ctggccactc ggctctgggt gcacaatggagccccagtca atgcctctgc ctcctgccgc 1740 gtgttaccca ccggggacct gctgctggtgggcagccagc agggtttggg ggtgttccag 1800 tgttggtcga tagaagaagg attccagcagcttgtggcca gctactgccc agaggtgatg 1860 gaggaggggg taatggacca aaagaaccagcgtgatggta ccccagtcat tatcaacaca 1920 tcacgagtga gtgcaccggc tggtggcagggacagctggg gtgcggacaa gtcctactgg 1980 aatgaattcc tggtgatgtg tactctgtttgtgtttgcta tggtgctttt gtttctgttc 2040 tttctctacc gacatcggga tggcatgaaactcttcctaa agcagggcga gtgtgccagt 2100 gtgcacccca agactcgccc tatagtgctaccacctgaga cccgaccgct gaatggtgtc 2160 ggccctccta gcaccccact tgaccaccgaggctaccagg ctctgtcgga tagctcccca 2220 gggcccagag tcttcactga atcagagaagaggccactga gcatccagga cagctttgta 2280 gaggtgtctc ccgtgtgtcc ccggccccgagttcgactgg gctctgagat ccgagactct 2340 gtggtatgag agctgacttt agatgtggtcaccctgacct cagggttgtg agtgtcagtg 2400 gaagtcagct acctctgctc tcacagaacacag 2433 25 463 DNA Homo sapiens unsure (368) unsure (402) unsure (458)25 gtttggcaaa aactcaagcg gctggaagga ggaagaggtt ctccagagtc ggaactgagg 60gttggaacta tacccgggac caaactcacg gaccactcga ggcctgcaaa ccttcctggg 120aggacaggca ggccagatgg ccgctccact ggggaatgct cccagctgtg ctgtggagag 180aagctgatgt tttggtgtat tgtcagccat cgtccttgga ctcggagact atggcctcgc 240tccccaccct cctcttggaa ttacaagccc tggggtttga agctgacttt atagctgcaa 300gtgtatctcc ttttatctgg tgcctcctca aacccagtct cagacactta aatgcagaca 360acaccttnct cctgcagaca cctgggactg agccaaggag gncttgggga aggcccttag 420ggggagcacc ctgatgggag aggacagagc aggggttnca gca 463 26 331 DNA Homosapiens unsure (13) unsure (15) unsure (322) 26 agaaaaagcc cantnttcactttattggag gtctctgcct ccattcacag gagaaaggag 60 ctgggagccc catcctaagggtcccagcat cagcccactg gagggcctgg aacagtccag 120 cactctgtgg gagaggagtggggaggggaa tgttttagaa aaaatagatc tctatgtaca 180 tctgacatat ttatatagcacataaattag ggagtgctct gacccctgcc cgtggagccc 240 aagcactgag cagggaggtgaacgccagtc cagaaagaag gtgctgggag cccctgctct 300 gtcctctcca tccacggtgctncccctagg g 331 27 1907 DNA Homo sapiens 27 cggccagata cctcagcgctacctggcgga actggatttc tctcccgcct gccggcctgc 60 ctgccacagc cggactccgccactccggta gcctcatggc tgcaacctgt gagattagca 120 acatttttag caactacttcagtgcgatgt acagctcgga ggactccacc ctggcctctg 180 ttccccctgc tgccacctttggggccgatg acttggtact gaccctgagc aacccccaga 240 tgtcattgga gggtacagagaaggccagct ggttggggga acagccccag ttctggtcga 300 agacgcaggt tctggactggatcagctacc aagtggagaa gaacaagtac gacgcaagcg 360 ccattgactt ctcacgatgtgacatggatg gcgccaccct ctgcaattgt gcccttgagg 420 agctgcgtct ggtctttgggcctctggggg accaactcca tgcccagctg cgagacctca 480 cttccagctc ttctgatgagctcagttgga tcattgagct gctggagaag gatggcatgg 540 ccttccagga ggccctagacccagggccct ttgaccaggg cagccccttt gcccaggagc 600 tgctggacga cggtcagcaagccagcccct accaccccgg cagctgtggc gcaggagccc 660 cctcccctgg cagctctgacgtctccaccg cagggactgg tgcttctcgg agctcccact 720 cctcagactc cggtggaagtgacgtggacc tggatcccac tgatggcaag ctcttcccca 780 gcgatggttt tcgtgactgcaagaaggggg atcccaagca cgggaagcgg aaacgaggcc 840 ggccccgaaa gctgagcaaagagtactggg actgtctcga gggcaagaag agcaagcacg 900 cgcccagagg cacccacctgtgggagttca tccgggacat cctcatccac ccggagctca 960 acgagggcct catgaagtgggagaatcggc atgaaggcgt cttcaagttc ctgcgctccg 1020 aggctgtggc ccaactatggggccaaaaga aaaagaacag caacatgacc tacgagaagc 1080 tgagccgggc catgaggtactactacaaac gggagatcct ggaacgggtg gatggccggc 1140 gactcgtcta caagtttggcaaaaactcaa gcggctggaa ggaggaagag gttctccaga 1200 gtcggaactg agggttggaactatacccgg gaccaaactc acggaccact cgaggcctgc 1260 aaaccttcct gggaggacaggcaggccaga tggcccctcc actggggaat gctcccagct 1320 gtgctgtgga gagaagctgatgttttggtg tattgtcagc catcgtcctt ggactcggag 1380 actatggcct cgcctccccaccctcctctt ggaattacaa gccctggggt ttgaagctga 1440 ctttatagct gcaagtgtatctccttttat ctggtgcctc ctcaaaccca gtctcagaca 1500 cttaaatgca gacaacaccttcttcctgca gacacttgga ctgagccaag gaggcttggg 1560 aggccctagg gagcaccgtgatggagagga cagagcaggg gctccagcac ttctttctgg 1620 actggcgttc acctccctgctcagtgcttg ggctccacgg gcaggggtca gagcactccc 1680 taatttatgt gctatataaatatgtcagat gtacatagag atctattttt tctaaaacat 1740 tcccctcccc actcctctcccacagagtgc tggactgttc caggccctcc agtgggctga 1800 tgctgggacc cttaggatggggctcccagc tcctttctcc tgtgaatgga ggcagagacc 1860 tccaataaag tgccttctgggctttttcta aaaaaaaaaa aaaaaaa 1907 28 467 DNA Homo sapiens unsure (428)unsure (462) 28 agtactacaa gcatcattct ctcaaggaag ggttcagaac cttagatacaactctgcagt 60 ttccatacaa ggagccagaa cattcagctg gacagagggg taatagagcaggcaacagct 120 tgttaagtcc aaaagtgctg ggcattgcat cgctcggtat gacttctgtgcaagagatat 180 gagagagttg tccttgttga aaggagatgt ggtgaagatt tacacaaagatgagtgcaaa 240 tggctggtgg agaggagaag taaatggcag ggtgggctgg tttccatccacatatgtggg 300 aaggaggatg aataaattca aatcccgtgt tgcaccctgc accaaaattttcagaggaag 360 gggataatta ggaagcctgc acagcttcgt ggatttaact tgaagtgtttttaaaaagct 420 ggcttttntg ggctgtttca acatcctccc tccttaggcc cntccta 46729 453 DNA Homo sapiens unsure (240) unsure (387) unsure (438) 29ttttttttcc caacatgtaa ctctctcagt cttgtcagaa cacaacttct gctatggagg 60aaatatttcc atcaggaaag ggccaagtta gtgtcttaac ttgactgcct tgaatgggga 120ctctggaccc caggaagaat gtatttaggc tcctcacaaa aaagagtgat ggctgggcaa 180aacaaatgta ctgcaagacc catcttccct ccagttaata cactcccagg gatgggnctg 240cagaggggga gactctgaga gaagctggag gcccacaaaa gtccactgac cctctttctg 300tcccagaaat gaataaagga cccagttgtg ctttccttcc aaaatcctca acaaagttgt 360ttgtgctcca aggaaaatgt gggggantta aaaaaatcat gttcccgggt catctttgtg 420tgtgttgcgg gggaggtngg tggggaggga aaa 453 30 4762 DNA Homo sapiens 30cccgccccgg cccagccgcg tcccggagcc gtcgggcatg gagccgtgga agcagtgcgc 60gcagtggctc atccattgca aggtgctgcc caccaaccac cgggtgacct gggactcggc 120tcaggtgttc gaccttgcgc agaccctccg cgatggagtc ctgctctgcc agctgcttaa 180caacctccgg gcgcactcca tcaacctgaa ggagatcaac ctgaggccgc agatgtccca 240gtttctctgt ttgaagaaca taaggacatt tctcacggcc tgttgtgaga cgtttggaat 300gaggaaaagt gaacttttcg aggcatttga cttgtttgat gttcgtgact ttggagaggt 360tatagaaaca ttatcacgac tttctcgaac acctatagca ttggccacag gaatcaggcc 420cttcccaaca gaagaaagca ttaatgatga agacatctac aaaggccttc ctgatttaat 480agatgaaacc cttgtggaag atgaagaaga tctctatgac tgtgtttatg gggaagatga 540aggtggagaa gtctatgagg acttaatgaa ggcagaggaa gcacatcagc ccaaatgtcc 600agaaaatgat atacgaagtt gttgtctagc agaaattaag cagacagaag aaaaatatac 660agaaactttg gagtcaatag aaaaatattt catggcacca ctaaaaagat ttctgacagc 720agcagaattt gattcagtat tcatcaacat tcctgaactt gtaaaacttc atcggaacct 780aatgcaagag attcatgatt ccattgtaaa taaaaatgac cagaacttgt accaagtttt 840tattaactac aaggaaagat tggttattta cgggcagtac tgcagtggag tggagtcagc 900catctctagt ttagactaca tttctaagtc aaaagaagat gtcaaactga aattagagga 960atgttccaaa agagcaaata atgggaaatt tactcttcga gacttgcttg tggttcctat 1020gcaacgtgtt ttaaagtacc accttctcct ccaggaactg gtcaaacata ccactgatcc 1080gactgagaag gcaaatctga aactggctct tgatgccatg aaggacttgg cacaatatgt 1140gaatgaagtg aaaagagata atgagaccct tcgtgaaatt aaacagtttc agctatctat 1200agagaatttg aaccaaccag ttttgctttt tggacgacct cagggagatg gtgaaattcg 1260aataaccact ctagacaagc ataccaaaca agaaaggcat atcttcttat ttgatttggc 1320agtgatcgta tgtaagagaa aaggtgataa ctatgaaatg aaggaaataa tagatcttca 1380gcagtacaag atagccaata atcctacaac cgataaagaa aacaaaaagt ggtcttatgg 1440cttctacctc atccataccc aaggacaaaa tgggttagaa ttttattgca aaacaaaaga 1500tttaaagaag aaatggctag aacagtttga aatggctttg tctaacataa gaccagacta 1560tgcagactcc aatttccacg acttcaagat gcataccttc actcgagtca catcctgcaa 1620agtctgccag atgctcctga ggggaacatt ttatcaaggc tatttatgtt ttaagtgtgg 1680agcgagagca cacaaagaat gtttgggaag agtagacaat tgtggcagag ttaattctgg 1740tgaacaaggg acactcaaac taccagagaa acggaccaat ggactgcgaa gaactcctaa 1800acaggtggat ccaggtttac caaagatgca ggtcattagg aactattctg gaacaccacc 1860cccagctctg catgaaggac cccctttaca gctccaggcc ggggataccg ttgaacttct 1920gaaaggagat gcacacagtc tgttttggca gggcagaaat ttagcatctg gagaggttgg 1980attttttcca agtgatgcag tcaagccttg cccatgtgtg cccaaaccag tagattattc 2040ttgccaaccc tggtatgctg gagcaatgga aagattgcaa gcagagaccg aacttattaa 2100tagggtaaat agtacttacc ttgtgaggca caggaccaaa gagtcaggag aatatgcaat 2160tagcattaag tacaataatg aagcaaagca catcaagatt ttaacaagag atggcttttt 2220tcacattgca gaaaatagaa aatttaaaag tttaatggaa cttgtggagt actacaagca 2280tcattctctc aaggaagggt tcagaacctt agatacaact ctgcagtttc catacaagga 2340gccagaacat tcagctggac agaggggtaa tagagcaggc aacagcttgt taagtccaaa 2400agtgctgggc attgccatcg ctcggtatga cttctgtgca agagatatga gagagttgtc 2460cttgttgaaa ggagatgtgg tgaagattta cacaaagatg agtgcaaatg gctggtggag 2520aggagaagta aatggcaggg tgggctggtt tccatccaca tatgtggaag aggatgaata 2580aattcaaatc ccgtgttgca ccctgcacca aaaatttcag agaagggata aatagaagcc 2640tgcacagcat cgtgaattaa ctgaagtgtt taaaaagctg catttctggc tgttcaacat 2700cctccctcct tagcccctcc taagtcttaa tgctgagatt tctaaagatg ctggtactga 2760cagattaatg gcttgcctag agctgtgcaa gaaacagcct gccagtctgt cattgtcagg 2820gaccagggca aaaccaagag ctgttcttcc cagaagagcc ctgcaaacac attggttcgt 2880gcttcccttt acttcttctg gtcagatacc atgaatgcca gtcatcagta aatcttaata 2940cacttttgct ttattctcac atgccattca ccagattatt tgatggtaca aagaagcaga 3000agtgtaattt tccttttccc agcatgacga aaaattggag ttctgccatt tgagcagctt 3060actggagaga tccagcctta cttgtcttaa attgtccaac aaggtgactc attgcccggc 3120aaacactttt accctcagat gttactcatg atattataaa atatgaggcc agtgctcagg 3180tttgcatcat aagtgagcta tccctgaagg gttttaatta cttatttggt gtcctgatta 3240tatttgcaaa cttctttata aaaggtgaaa aaagcacaca aaagagaggg tgtcttcata 3300ttaaaccttc acaaccttca tgatttcata ggattatttt ggaaatatag cacttgactt 3360tatgaaagga tctgggctag gtatattagg ggtagttgcc aataacctga agaagctggc 3420attgtttaca gaaacagatc aagggctata atttatgtca ttttatagca gcagtatcta 3480ttaatacatg ccttttcctc ccatccacct cccccgcaca cacacaaaga tgacctggga 3540catgattttt ttattcccac attttcttgg agcacaaaca actttgttga ggattttgga 3600aggaaagcac aactgggtcc tttattcatt tctgggacag aaagagggtc agtggacttt 3660tgtgggcctc cagcttctct cagagtctcc ccctctgcag cccatcctgg gagtgtatta 3720actggaggga agatgggtct tgcagtacat ttgttttgcc cagccatcac tcttttttgt 3780gaggagccta aatacattct tcctggggtc cagagtcccc attcaaggca gtcaagttaa 3840gacactaact tggccctttc ctgatggaaa tatttcctcc atagcagaag ttgtgttctg 3900acaagactga gagagttaca tgttgggaaa aaaaagaagc attaacttag tagaactgaa 3960ccaggagcat taagttctga aattttgaat catctctgaa atgaagcagg tgtagcctgc 4020cctctcatca atccgtccgt ctgggtgcca gaactcaagg ttcagtggac acatccccct 4080gttagagacc ctcatgggct aggacttttc atctaggata gattcaagac ctttacctca 4140gaattatgta aactgtgatt gtgttttaga aaaattatta tttgctaaaa ccatttaagt 4200ctttgtatat gtgtaaatga tcacaaaaat gtattttata aaatgttctg tacaataaag 4260ttacacctca aagtgtactc ttggaatgga ttctttcctg taaagtctta tctgcgactc 4320tgtctcggga atgttttgtc tgttgccgtc agccgaactt tgttatggag ggagcagcct 4380cacacaagca gaaacactcc tgtggatggt attgtagcat gtattgttta ttttagtcaa 4440tagaccctct ccttataaat ggtgtttagt cttcctgttg catttcatgg gcctgggggt 4500ttcctrgcag aggatattgg agcccctttt tgtgacatta ccaattacat ctttgtccac 4560gtttaatact ttgttttgga aaatttaaat gctgcagatt tgtgtagagt tctaatacca 4620aagacagaag taaatgtttt ccatatactt tgtcttgcct gtatgcagcc cttgtgtaat 4680atggtgaatt agagtggtat ttcactttgt attattttgt aaatatgtca atataataaa 4740tagtgactaa aaaaaaaaaa aa 4762 31 422 DNA Homo sapiens unsure (80) unsure(217) unsure (254) unsure (311) unsure (321) unsure (375) unsure (381)unsure (386) unsure (394) unsure (416) 31 ttttttactt tattttcgttttaatttttt ggaaggatat acaccacata tcccatgggc 60 aataaagcgc attcaatgtntttataagcc aaacagtcac tttgtttaag caaacacaag 120 tacaaagtaa aatagaaccacaaaataatg aactgcatgt tcataacata caaaaatcgc 180 cgcctactca gtaggtaactacaacattcc aactccngaa tatatttata aatttacatt 240 ttcagttaaa aaantagacttttgagagtt cagattttgt tttagatttt gttttcttac 300 attctggaga ncccgaagctncagctcagc ccctcttccc ttattttgct ccccaaagcc 360 ttccccccaa atcancactgncctgncccc cctntaaggg cttagaggtg agcatntccc 420 ct 422 32 3132 DNA Homosapiens 32 ccgcagaact tggggagccg ccgccgccat ccgccgccgc agccagcttccgccgccgca 60 ggaccggccc ctgccccagc ctccgcagcc gcggcgcgtc cacgcccgcccgcgcccagg 120 gcgagtcggg gtcgccgcct gcacgcttct cagtgttccc cgcgccccgcatgtaacccg 180 gccaggcccc cgcaacggtg tcccctgcag ctccagcccc gggctgcacccccccgcccc 240 gacaccagct ctccagcctg ctcgtccagg atggccgcgg ccaaggccgagatgcagctg 300 atgtccccgc tgcagatctc tgacccgttc ggatcctttc ctcactcgcccaccatggac 360 aactacccta agctggagga gatgatgctg ctgagcaacg gggctccccagttcctcggc 420 gccgccgggg ccccagaggg cagcggcagc aacagcagca gcagcagcagcgggggcggt 480 ggaggcggcg ggggcggcag caacagcagc agcagcagca gcaccttcaaccctcaggcg 540 gacacgggcg agcagcccta cgagcacctg accgcagagt cttttcctgacatctctctg 600 aacaacgaga aggtgctggt ggagaccagt taccccagcc aaaccactcgactgcccccc 660 atcacctata ctggccgctt ttccctggag cctgcaccca acagtggcaacaccttgtgg 720 cccgagcccc tcttcagctt ggtcagtggc ctagtgagca tgaccaacccaccggcctcc 780 tcgtcctcag caccatctcc agcggcctcc tccgcctccg cctcccagagcccacccctg 840 agctgcgcag tgccatccaa cgacagcagt cccatttact cagcggcacccaccttcccc 900 acgccgaaca ctgacatttt ccctgagcca caaagccagg ccttcccgggctcggcaggg 960 acagcgctcc agtacccgcc tcctgcctac cctgccgcca agggtggcttccaggttccc 1020 atgatccccg actacctgtt tccacagcag cagggggatc tgggcctgggcaccccagac 1080 cagaagccct tccagggcct ggagagccgc acccagcagc cttcgctaacccctctgtct 1140 actattaagg cctttgccac tcagtcgggc tcccaggacc tgaaggccctcaataccagc 1200 taccagtccc agctcatcaa acccagccgc atgcgcaagt atcccaaccggcccagcaag 1260 acgccccccc acgaacgccc ttacgcttgc ccagtggagt cctgtgatcgccgcttctcc 1320 cgctccgacg agctcacccg ccacatccgc atccacacag gccagaagcccttccagtgc 1380 cgcatctgca tgcgcaactt cagccgcagc gaccacctca ccacccacatccgcacccac 1440 acaggcgaaa agcccttcgc ctgcgacatc tgtggaagaa agtttgccaggagcgatgaa 1500 cgcaagaggc ataccaagat ccacttgcgg cagaaggaca agaaagcagacaaaagtgtt 1560 gtggcctctt cggccacctc ctctctctct tcctacccgt ccccggttgctacctcttac 1620 ccgtccccgg ttactacctc ttatccatcc ccggccacca cctcatacccatcccctgtg 1680 cccacctcct tctcctctcc cggctcctcg acctacccat cccctgtgcacagtggcttc 1740 ccctccccgt cggtggccac cacgtactcc tctgttcccc ctgctttcccggcccaggtc 1800 agcagcttcc cttcctcagc tgtcaccaac tccttcagcg cctccacagggctttcggac 1860 atgacagcaa ccttttctcc caggacaatt gaaatttgct aaagggaaaggggaaagaaa 1920 gggaaaaggg agaaaaagaa acacaagaga cttaaaggac aggaggaggagatggccata 1980 ggagaggagg gttcctctta ggtcagatgg aggttctcag agccaagtcctccctctcta 2040 ctggagtgga aggtctattg gccaacaatc ctttctgccc acttccccttccccaattac 2100 tattcccttt gacttcagct gcctgaaaca gccatgtcca agttcttcacctctatccaa 2160 agaacttgat ttgcatggat tttggataaa tcatttcagt atcatctccatcatatgcct 2220 gaccccttgc tcccttcaat gctagaaaat cgagttggca aaatggggtttgggcccctc 2280 agagccctgc cctgcaccct tgtacagtgt ctgtgccatg gatttcgtttttcttggggt 2340 actcttgatg tgaagataat ttgcatattc tattgtatta tttggagttaggtcctcact 2400 tgggggaaaa aaaaaaaaaa aagccaagca aaccaatggt gatcctctattttgtgatga 2460 tgctgtgaca ataagtttga accttttttt ttgaaacagc agtcccagtattctcagagc 2520 atgtgtcaga gtgttgttcc gttaaccttt ttgtaaatac tgcttgaccgtactctcaca 2580 tgtggcaaaa tatggtttgg tttttctttt ttttttttga aagtgttttttcttcgtcct 2640 tttggtttaa aaagtttcac gtcttggtgc cttttgtgtg atgccccttgctgatggctt 2700 gacatgtgca attgtgaggg acatgctcac ctctagcctt aaggggggcagggagtgatg 2760 atttggggga ggctttggga gcaaaataag gaagagggct gagctgagcttcggttctcc 2820 agaatgtaag aaaacaaaat ctaaaacaaa atctgaactc tcaaaagtctatttttttaa 2880 ctgaaaatgt aaatttataa atatattcag gagttggaat gttgtagttacctactgagt 2940 aggcggcgat ttttgtatgt tatgaacatg cagttcatta ttttgtggttctattttact 3000 ttgtacttgt gtttgcttaa acaaagtgac tgtttggctt ataaacacattgaatgcgct 3060 ttattgccca tgggatatgt ggtgtatatc cttccaaaaa attaaaacgaaaataaagta 3120 gctgcgattg gg 3132 33 464 DNA Homo sapiens unsure (364)unsure (401) unsure (425) unsure (439) 33 ttaaggtata cacttttattcaactggtct caagtcagtg tacaggtaag ccctggctgc 60 ctccacccac tcccagggagaccaaaagcc ttcatacatc tcaagttggg ggacaaaaaa 120 gggggaaggg ggggcacgaaggctcatcat tcaaaataaa acaaaataaa aaagtattaa 180 ggcgaagatt aaaaaaattttgcattacat aatttacacg aaagcaatgc tatcacctcc 240 cctgtgtgga cttgggagaggactgggcca ttctccttag gagagaagtg ggggtgggct 300 tttagggatg ggcaaggggactttcctgtt aacaacggca tcttcatatt ttgggaattg 360 actntttaaa aaaaaccaacaatgtggcaa ttcaaagtcc ntcgggccac atttgtggaa 420 ctttnggggg gttgctcgntcccacccgac tgttgttcac cttt 464 34 3646 DNA Homo sapiens 34 gcccagcaccccaaggcggc caacgccaaa actctccctc ctcctcttcc tcaatctcgc 60 tctcgctctttttttttttc gcaaaaggag gggagagggg gtaaaaaaat gctgcactgt 120 gcggcgaagccggtgagtga gcggcgcggg gccaatcagc gtgcgccgtt ccgaaagttg 180 ccttttatggctcgagcggc cgcggcggcg ccctataaaa cccagcggcg cgacgcgcca 240 ccaccgccgagaccgcgtcc gcccgcgagc acagagcctc gcctttgccg atccgccgcc 300 cgtccacacccgccgccagg taagcccggc cagccgaccg gggcatgcgg ccgcggccct 360 tcgcccgtgcagagccgccg tctgggccgc agcggggggc gcatggggcg gaaccggacc 420 gccgtggggggcgcgggaga agcccctggg cctccggaga tgggggacac cccacgccag 480 ttcgcaggcgcgaggccgcg ctcgggcggg cgcgctccgg gggtgccgct ctcggggcgg 540 gggcaaccggcggggtcttt gtctgagccg ggctcttgcc aatggggatc gcacggtggg 600 cgcggcgtagcccccgtcag gcccggtggg ggctggggcg ccatgcgcgt gcgcgctggt 660 cctttgggcgctaactgcgt gcgcgctggg aattggcgct aattgcgcgt gcgcgctggg 720 actcaatggcgctaatcgcg cgtgcgttct ggggcccggg cgcttgcgcc acttcctgcc 780 cgagccgctggcgcccgagg gtgtggccgc tgcgtgcgcg cgcgcgaccc ggtcgctgtt 840 tgaaccgggcggaggcgggg ctggcgcccg gttgggaggg ggttggggcc tggcttcctg 900 ccgcgcgccgcggggacgcc tccgaccagt gtttgccttt tatggtaata acgcggccgg 960 cccggcttcctttgtcccca atctgggcgc gcgccggcgc cccctggcgg cctaaggact 1020 cggcgcgccggaagtggcca gggcgggggc gacttcggct cacagcgcgc ccggctattc 1080 tcgcagctcaccatggatga tgatatcgcc gcgctcgtcg tcgacaacgg ctccggcatg 1140 tgcaaggccggcttcgcggg cgacgatgcc ccccgggccg tcttcccctc catcgtgggg 1200 cgccccaggcaccaggtagg ggagctggct gggtggggca gccccgggag cgggcgggag 1260 gcaagggcgctttctctgca caggagcctc ccggtttccg gggtgggctg cgcccgtgct 1320 cagggcttcttgtcctttcc ttcccagggc gtgatggtgg gcatgggtca gaaggattcc 1380 tatgtgggcgacgaggccca gagcaagaga ggcatcctca ccctgaagta ccccatcgag 1440 cacggcatcgtcaccaactg ggacgacatg gagaaaatct ggcaccacac cttctacaat 1500 gagctgcgtgtggctcccga ggagcacccc gtgctgctga ccgaggcccc cctgaacccc 1560 aaggccaaccgcgagaagat gacccaggtg agtggcccgc tacctcttct ggtggccgcc 1620 tccctccttcctggcctccc ggagctgcgc cctttctcac tggttctctc ttctgccgtt 1680 ttccgtaggactctcttctc tgacctgagt ctcctttgga actctgcagg ttctatttgc 1740 tttttcccagatgagctctt tttctggtgt ttgtctctct gactaggtgt ctgagacagt 1800 gttgtgggtgtaggtactaa cactggctcg tgtgacaagg ccatgaggct ggtgtaaagc 1860 ggccttggagtgtgtattaa gtaggcgcac agtaggtctg aacagactcc ccatcccaag 1920 accccagcacacttagccgt gttctttgca ctttctgcat gtcccccgtc tggcctggct 1980 gtccccagtggcttccccag tgtgacatgg tgcatctctg ccttacagat catgtttgag 2040 accttcaacaccccagccat gtacgttgct atccaggctg tgctatccct gtacgcctct 2100 ggccgtaccactggcatcgt gatggactcc ggtgacgggg tcacccacac tgtgcccatc 2160 tacgaggggtatgccctccc ccatgccatc ctgcgtctgg acctggctgg ccgggacctg 2220 actgactacctcatgaagat cctcaccgag cgcggctaca gcttcaccac cacggccgag 2280 cgggaaatcgtgcgtgacat taaggagaag ctgtgctacg tcgccctgga cttcgagcaa 2340 gagatggccacggctgcttc cagctcctcc ctggagaaga gctacgagct gcctgacggc 2400 caggtcatcaccattggcaa tgagcggttc cgctgccctg aggcactctt ccagccttcc 2460 ttcctgggtgagtggagact gtctcccggc tctgcctgac atgagggtta cccctcgggg 2520 ctgtgctgtggaagctaagt cctgccctca tttccctctc aggcatggag tcctgtggca 2580 tccacgaaactaccttcaac tccatcatga agtgtgacgt ggacatccgc aaagacctgt 2640 acgccaacacagtgctgtct ggcggcacca ccatgtaccc tggcattgcc gacaggatgc 2700 agaaggagatcactgccctg gcacccagca caatgaagat caaggtgggt gtctttcctg 2760 cctgagctgacctgggcagg tcagctgtgg ggtcctgtgg tgtgtgggga gctgtcacat 2820 ccagggtcctcactgcctgt ccccttccct cctcagatca ttgctcctcc tgagcgcaag 2880 tactccgtgtggatcggcgg ctccatcctg gcctcgctgt ccaccttcca gcagatgtgg 2940 atcagcaagcaggagtatga cgagtccggc ccctccatcg tccaccgcaa atgcttctag 3000 gcggactatgacttagttgc gttacaccct ttcttgacaa aacctaactt gcgcagaaaa 3060 caagatgagattggcatggc tttatttgtt ttttttgttt tgttttggtt tttttttttt 3120 ttttggcttgactcaggatt taaaaactgg aacggtgaag gtgacagcag tcggttggag 3180 cgagcatcccccaaagttca caatgtggcc gaggactttg attgcattgt tgttttttta 3240 atagtcattccaaatatgag atgcattgtt acaggaagtc ccttgccatc ctaaaagcca 3300 ccccacttctctctaaggag aatggcccag tcctctccca agtccacaca ggggaggtga 3360 tagcattgctttcgtgtaaa ttatgtaatg caaaattttt ttaatcttcg ccttaatact 3420 tttttattttgttttatttt gaatgatgag ccttcgtgcc cccccttccc cctttttgtc 3480 ccccaacttgagatgtatga aggcttttgg tctccctggg agtgggtgga ggcagccagg 3540 gcttacctgtacactgactt gagaccagtt gaataaaagt gcacacctta aaaatgaggc 3600 caagtgtgactttgtggtgt ggctgggttg ggggcagcag agggtg 3646 35 318 DNA Homo sapiensunsure (9) unsure (51) unsure (119) unsure (247) unsure (301) unsure(313) 35 ctcgatttng ggaagttgta gactgcacaa ttaaaacaga tccagtcactnggagatcaa 60 gaggatttgg atttgtgctt ttcaaagatg ctgctagtgt tgataaggttttggaactna 120 aagaacacaa actggatggc aaattgatag atcccaaaag ggccaaagctttaaaaggga 180 aagaacctcc caaaaaggtt tttgtgggtg gattgagccc ggatacttctgaagaacaaa 240 ttaaagnata ttttggagcc tttggagaga ttgaaaatat tgaacttcccatggatacaa 300 naacaaattg aanggaag 318 36 1291 DNA Homo sapiens 36gatctcttcc gccgccattt taaatccagc tccatacaac gctccgccgc cgctgctgcc 60gcgacccgga ctgcgcgcca gcacccccct gccgacagct ccgtcactat ggaggatatg 120aacgagtaca gcaatataga ggaattcgca gagggatcca agatcaacgc gagcaagaat 180cagcaggatg acggtaaaat gtttattgga ggcttgagct gggatacaag caaaaaagat 240ctgacagagt acttgtctcg atttggggaa gttgtagact gcacaattaa aacagatcca 300gtcactggga gatcaagagg atttggattt gtgcttttca aagatgctgc tagtgttgat 360aaggttttgg aactgaaaga acacaaactg gatggcaaat tgatagatcc caaaagggcc 420aaagctttaa aagggaaaga acctcccaaa aaggtttttg tgggtggatt gagcccggat 480acttctgaag aacaaattaa agaatatttt ggagcctttg gagagattga aaatattgaa 540cttcccatgg atacaaaaac aaatgaaaga agaggatttt gttttatcac atatactgat 600gaagagccag taaaaaaatt gttagaaagc agataccatc aaattggttc tgggaagtgt 660gaaatcaaag ttgcacaacc caaagaggta tataggcagc aacagcaaca acaaaaaggt 720ggaagaggtg ctgcagctgg tggacgaggt ggtacgaggg gtcgtggccg aggtcagggc 780caaaactgga accaaggatt taataactat tatgatcaag gatatggaaa ttacaatagt 840gcctatggtg gtgatcaaaa ctatagtggc tatggcggat atgattatac tgggtataac 900tatgggaact atggatatgg acagggatat gcagactaca gtggccaaca gagcacttat 960ggcaaggcat ctcgaggggg tggcaatcac caaaacaatt accagccata ctaaaggaga 1020acattggaga aaacaggagg agatgttaaa gtaacccatc ttgcaggacg acattgaaga 1080ttggtcttct gttgatctaa gatgattatt ttgtaaaaga ctttctagtg tacaagacac 1140cattgtgtcc aactgtatat agctgccaat tagttttctt tgtttttact ttgtcctttg 1200ctatctgtgt tatgactcaa tgtggatttg tttatacaca ttttatttgt atcatttcat 1260gttaaacctc aaataaatgc ttccttatgt g 1291 37 2439 DNA Homo sapiens 37gaattcgcag agggatccaa gatcaacgcg agcaagaatc agcaggatga cggtaaaatg 60tttattggag gcttgagctg ggatacaagc aaaaaagatc tgacagagta cttgtctcga 120tttggggaag ttgtagactg cacaattaaa acagatccag tcactgggag atcaagagga 180tttggatttg tgcttttcaa agatgctgct agtgttgata aggttttgga actgaaagaa 240cacaaactgg atggcaaatt gatagatccc aaaagggcca aagctttaaa agggaaagaa 300cctcccaaaa aggtttttgt gggtggattg agcccggata cttctgaaga acaaattaaa 360gaatattttg gagcctttgg agagattgaa aatattgaac ttcccatgga tacaaaaaca 420aatgaaagaa gaggattttg ttttatcaca tatactgatg aagagccagt aaaaaaattg 480ttagaaagca gataccatca aattggttct gggaagtgtg aaatcaaagt tgcacaaccc 540aaagaggtat ataggcagca acagcaacaa caaaaaggtg gaagaggtgc tgcagctggt 600ggacgaggtg gtacgagggg tcgtggccga ggtcagggcc aaaactggaa ccaaggattt 660aataactatt atgatcaagg atatggaaat tacaatagtg cctatggtgg tgatcaaaac 720tatagtggct atggcggata tgattatact gggtataact atgggaacta tggatatgga 780cagggatatg cagactacag tggccaacag agcacttatg gcaaggcatc tcgagggggt 840ggcaatcacc aaaacaatta ccagccatac taaaggagaa cattggagaa aacaggagga 900gatgttaaag taacccatct tgcaggacga cattgaagat tggtcttctg ttgatctaag 960atgattattt tgtaaaagac tttctagtgt acaagacacc attgtgtcca actgtatata 1020gctgccaatt agttttcttt gtttttactt tgtcctttgc tatctgtgtt atgactcaat 1080gtggatttgt ttatacacat tttatttgta tcatttcatg ttaaacctca aataaatgct 1140tccttatgtg attgcttttc tgcgtcaggt actacatagc tctgtaaaaa atgtaattta 1200aaataagcaa taattaaggc acagttgatt ttgtagagta ttggtccata cagagaaact 1260gtggtccttt ataaatagcc agccagcgtc accctcttct ccaatttgta ggtgtatttt 1320atgctcttaa ggcttcatct tctccctgta actgagattt ctaccacacc tttgaacaat 1380gttctttccc ttctggttat ctgaagactg tcctgaaagg aagacataag tgttgtgatt 1440agtagaagct ttgtaatcat aacacaatga gtaattcttg tataaaagtt cagatacaaa 1500aggagcactg taaaactggt aggagctatg gtttaagagc attggaagta gttacaactc 1560aaggattttg gtagaaaggt atgagtttgg tcgaaaaatt aaaatagtgg caaaataaga 1620tttagttgtg ttttctcaga gccgccacaa gattgaacaa aatgttttct gtttgggcat 1680cctgaggaag ttgtattagc tgttaatgct ctgtgagttt agagaaaagt cttgatagta 1740aatctagttt ttgacacagt gcatgaacta agtagttaaa tatttacata ttcagaaagg 1800aatagtggaa aaggtatctt ggttatgaca aagtcattac aaatgtgact aagtcattac 1860aaatgtgact gagtcattac agtggaccct ctgggtgcat tgaaaagaat ccgttttata 1920tccaggtttc agaggacctg gaataataat aagctttgga ttttgcattc agtgtagttg 1980gattttggga ccttggcctc agtgttattt actgggattg gcatacgtgt tcacaggcag 2040agtagttgat ctcacacaac gggtgatctc acaaaactgg taagtttctt atgctcatga 2100gccctccctt ttttttttta atttggtgcc tgcaactttc ttaacaatga ttctacttcc 2160tgggctatca cattataatg ctcttggcct cttttttgct gctgttttgc tattcttaaa 2220cttaggccaa gtaccaatgt tggctgttag aagggattct gttcattcaa catgcaactt 2280tagggaatgg aagtaagttc atttttaagt tgtgtggtca gtaggtgcgg tgtctagggt 2340agtgaatcct gtaagttcaa atttatgatt aggtgacgag ttgacattga gattgtcctt 2400ttcccctgat caaaaaaatg aataaagcct ttttaaacg 2439 38 459 DNA Homo sapiensunsure (426) unsure (445) 38 ttttacagat ctttttgact atcttcctct cactgccttggtggatgggc agatcttctg 60 tctacatggt ggtctctcgc catctataga tacactggatcatatcagag cacttgatcg 120 cctacaagaa gttccccatg agggtccaat gtgtgacttgctgtggtcag atccagatga 180 ccgtggtggt tggggtatat ctcctcgagg agctggttacacctttgggc aagatatttc 240 tgagacattt aatcatgcca atggcctcac gttggtgtctagagctcacc agctagtgat 300 ggagggatat aactggtgcc atgaccggaa tgtagtaacgattttcagtg ctccaaacta 360 ttgttatcgt tgtggtaacc aagctgcaat catgggaacttgacgatact ctaaaatact 420 ctttcntgca gttttgaccc agcanctcgt agggccgag 45939 1787 DNA Homo sapiens 39 gagagctcgg ctctcggagg aggaggcgca cggccagcggcagtactgcg gtgagagcca 60 gcggccagcg ccacgctcaa cagccgccag aagtacacgaggaaccggcg gcggcgtgtg 120 cgtgtaagcc ggcggcggcg cgggaggagc cggagcggcagccggctggg gcgggtggca 180 tcatggacga gaaggtgttc accaaggagc tggaccagtggatcgagcag ctgaacgagt 240 gcaagcagct gtccgagtcc caggtcaaga gcctctgcgagaaggctaaa gaaatcctga 300 caaaagaatc caacgtgcaa gaggttcgat gtccagttactgtctgtgga gatgtgcatg 360 ggcaatttca tgatctcatg gaactgttta gaattggtggcaaatcacca gatacaaatt 420 acttgtttat gggagattat gttgacagag gatattattcagttgaaaca gttacactgc 480 ttgtagctct taaggttcgt taccgtgaac gcatcaccattcttcgaggg aatcatgaga 540 gcagacagat cacacaagtt tatggtttct atgatgaatgtttaagaaaa tatggaaatg 600 caaatgtttg gaaatatttt acagatcttt ttgactatcttcctctcact gccttggtgg 660 atgggcagat cttctgtcta catggtggtc tctcgccatctatagataca ctggatcata 720 tcagagcact tgatcgccta caagaagttc cccatgagggtccaatgtgt gacttgctgt 780 ggtcagatcc agatgaccgt ggtggttggg gtatatctcctcgaggagct ggttacacct 840 ttgggcaaga tatttctgag acatttaatc atgccaatggcctcacgttg gtgtctagag 900 ctcaccagct agtgatggag ggatataact ggtgccatgaccggaatgta gtaacgattt 960 tcagtgctcc aaactattgt tatcgttgtg gtaaccaagctgcaatcatg gaacttgacg 1020 atactctaaa atactctttc ttgcagtttg acccagcacctcgtagaggc gagccacatg 1080 ttactcgtcg taccccagac tacttcctgt aatgaaattttaaacttgta cagtattgcc 1140 atgaaccata tatcgaccta atggaaatgg gaagagcaacagtaactcca aagtgtcaga 1200 aaatagttaa cattcaaaaa acttgttttc acatggaccaaaagatgtgc catataaaaa 1260 tacaaagcct cttgtcatca acagccgtga ccactttagaatgaaccagt tcattgcatg 1320 ctgaagcgac attgttggtc aagaaaccag tttctggcatagcgctattt gtagttactt 1380 ttgtttctct gagagactgc agataataag atgtaaacattaacacctcg tgaatacaat 1440 ttaacttcca tttagctata gctttactca gcatgactgtagataaggat agcagcaaac 1500 aatcattgga gcttaatgaa catttttaaa aataattaccaaggcctccc ttctacttgt 1560 gagttttgaa attgttcttt ttattttcag ggataccgtttaatttaatt atatgatttg 1620 tctgcactca gtttattccc tactcaaatc tcagccccatgttgttcttt gttattgtca 1680 gaacctggtg agttgttttg aacagaactg ttttttccccttcctgtaag acgatgtgac 1740 tgcacaagag cactgcagtg tttttcataa taaacttgtgaactaac 1787 40 452 DNA Homo sapiens unsure (33)..(34) unsure (59)unsure (82) unsure (112) unsure (126) unsure (164) unsure (184) unsure(225) unsure (244) unsure (253) unsure (272) unsure (307) unsure (316)unsure (329) unsure (335) unsure (381) unsure (396) unsure (417) unsure(422) unsure (429) unsure (448) 40 gtttacagat gccacttagt tacactggttttnntttttc agtctcatct gggttgganc 60 caaagacatt cagaggcatg gnaagaggcaaagcatcaga catctcattg gnggcaggta 120 cttccngact actgtaccac ctgctgtatccttccccacc tcancacccc caaagccatt 180 tagngccaaa tgctacagta aaaacccaatgcatttacat aaaanaatgc ctaactgcat 240 attnacattt ttnagaaaaa aaatcccattangctcttct agaaagttat ggcaggaaag 300 gtaaggncca aggctntgag caagccatntgtggnaactt aaagtagatg agcactgagt 360 ttctccatag ttggaaaaaa ngccacactgagcccncttt tcccgtggag ggcaagntga 420 gnccctccnt ttataccccg ttgagatntc ag452 41 263 DNA Homo sapiens unsure (214) unsure (231) unsure (238) 41gagaaaaggg ttggggagaa gcctctgcag tcctggaaga tgtggggttc tgggtgagag 60gcatcagccc cacaagtatg tttttgtgtc ttaagatagc agtttacttt gaaaaagtga 120aaaaggcttc cgggctgtcc tctgcccagt gagatggagg acgctagaga aagtgctgag 180tgtcccgaga gaggcccccg agccagtgca tggnaggtcc ttcggcctgg ntcagctngg 240ctgcaggatg cccactttga gga 263 42 3049 DNA Homo sapiens 42 cccgcgggcaggggcggcga gtgcgcgggc cgccgccctt ctcggcgggc agcgcgcgag 60 gaccaggccgaggaggaagt ggcggcggcg gcggcgggct ccccgcccga ggaggaagat 120 gcagacctttctgaaaggga agagagttgg ctactggctg agcgagaaga aaatcaagaa 180 gctgaatttccaggctttcg ccgagctgtg caggaagcga gggatggagg ttgtgcagct 240 gaaccttagccggccgatcg aggagcaggg ccccctggac gtcatcatcc acaagctgac 300 tgacgtcatccttgaagccg accagaatga tagccagtcc ctggagctgg tgcacaggtt 360 ccaggagtacatcgatgccc accctgagac catcgtcctg gacccgctcc ctgccatcag 420 aaccctgcttgaccgctcca agtcctatga gctcatccgg aagattgagg cctacatgga 480 agacgacaggatctgctcgc cacccttcat ggagctcacg agcctgtgcg gggatgacac 540 catgcggctgctggagaaga acggcttgac tttcccattc atttgcaaaa ccagagtggc 600 tcatggcaccaactctcacg agatggctat cgtgttcaac caggagggcc tgaacgccat 660 ccagccaccctgcgtggtcc agaatttcat caaccacaac gccgtcctgt acaaggtgtt 720 cgtggttggcgagtcctaca ccgtggtcca gaggccctca ctcaagaact tctccgcagg 780 cacatcagaccgtgagtcca tcttcttcaa cagccacaac gtgtcaaagc cggagtcgtc 840 atcggtcctgacggagctgg acaagatcga gggcgtgttc gagcggccga gcgacgaggt 900 catccgggagctctcccggg ccctgcggca ggcactgggc gtgtcactct tcggcatcga 960 catcatcatcaacaaccaga cagggcagca cgccgtcatt gacatcaatg ccttcccagg 1020 ctacgagggcgtgagcgagt tcttcacaga cctcctgaac cacatcgcca ctgtcctgca 1080 gggccagagcacagccatgg cagccacagg ggacgtggcc ctgctgaggc acagcaagct 1140 tctggccgagccggcgggcg gcctggtggg cgagcggaca tgcaacgcca gccccggctg 1200 ctgcggcagcatgatgggcc aggacgcgcc ctggaaagct gaggccgacg cgggcggcac 1260 cgccaagctgccgcaccaga gactcggctg caacgccggc gtgtctccca gcttccagca 1320 gcattgtgtggcctccctgg ccaccaaggc ctcctcccag tagccacgga gccgggaccc 1380 agagggcagcgcaggcgcag gagcacaccc gctgggccag cagctcccaa cggcgatgct 1440 actactaagaatccccagtg atctgattct tctgtttttt aatttttaac ctgattttct 1500 gatgtcatgatctaaatgag gggtagaaga gagtaccagg tggtccaccg ttggggagcg 1560 gggccgtccgcctgctctct actgtgcaga cctcctaact gagtttacac acgcttgtgt 1620 tgcaacactaggtctggatg ggaggtgagg ggggtgcgta tactgccatg ccagtgtctg 1680 tgcacatccctgtctgttgt ctccatggcc actgtggact gggacccttg aagcctgccc 1740 atgtgggtgtgggaggctga tcagtgcgtg tgagagtggc ttcccttctg cctgactccc 1800 cactccctgacctgcccctt ccttgttttt cctcctactg gtctccacca aggctttgtt 1860 agcccccaccctgcctggtg tgcagctaac ccctccctcc ccacagccag aggaggccac 1920 agacccctcagggagttccg cgctggggtc tgggctgtgc tccctcacta aagggaagga 1980 aaggaagctgggcgtcctcc gggcccccca acacacgtcc catttagccc tgcacagcgg 2040 tctccttcccctaagccagc actgctgctc cctggagccg ggaaggaggc tgcctggctg 2100 gaggccgagccgatgggcct gtgctgagga tttgtgctgt gatttgggca aatcattcca 2160 ggtctttgggcctccacccc ctcgtctcta gtggacattt gagatcagag agcaccacag 2220 ggctggctttgtgccctaac ccctgggatg cagcctgcct ttccataaag tcacctaggt 2280 gaggataggcgcgggagcct cggcatgaca ccatggagat cggggccctc ttcccagtgg 2340 gttcactccttttcacacct gctgggtccc tcctcgccca gcaggcctgg tccacctctc 2400 attgcaagcccgcaagcact gagccgagta aggtgcttag tgtgagccac ccgcccccca 2460 tagcttctgcacacctcaga ctcaccccat caccttggca gcaaagcact gctctgccgt 2520 ctgacccctgatccaggcag cagccccctc cgcagagaaa agggttgggg agaagcctct 2580 gcagtcctggaagatgtggg gtgctgggtg agaggcatca gcccccacaa gtatgttttt 2640 gtgtcttaagatagcagttt actttgaaaa agtgaaaaag gcttccgggc tgtcctctgc 2700 ccagtgagatggaggacgct agagaaagtg ctgagtgtcc cgagagaggc ccccgagcca 2760 gtgcatggaggtcttcggcc tggctcagct gggctgcagg atgcccactt tgaggaggga 2820 ggcacagggcttgggcgagg ggcagaggcc atcagaactg cccggctttt ttggaaactg 2880 aggacccaacaactaaccac gtttacacga cttgagtttt gaaccccgat taatgtctgt 2940 acgtcacctttcctagttct gaccctgagc cctggggaac aggaaagcgt ggctggcctc 3000 ttgcactgctttgtctccaa aataaactac tgaaatcaaa ccgcatttc 3049 43 417 DNA Homo sapiensunsure (198) unsure (260) unsure (299) unsure (344) unsure (373) unsure(378) unsure (384) unsure (410) 43 ggttgagccc tacaactgca tcctcaccacccacaccacc ctggagcact ctgattgtgc 60 cttcatggta gacaatgagg ccatctatgacatctgtcgt agaaacctcg atatcgagcg 120 cccaacctac accaacctta accgccttattagccagatt gtgtcctcca tcactgcttc 180 cctgagattt gatggagncc tgaatgttgacctgacagaa ttccagacca acctgggtgc 240 cctacccccg catccacttn cctctggccacatatgcccc tgtcatctct gctgagaang 300 cctaccacga acagcttact gtagtagagatcaccaatgc ttgntttgag ccagccaacc 360 agatggtgaa atntggancc ttgncattggtaaattacat ggggtttgcn gtctgtt 417 44 1596 DNA Homo sapiens 44 tgtcggggacggtaaccggg acccgtgctc tgctcctgtc gccttcgcct cctgaatccc 60 tagccatatgcgtgagtgca tctccatcca cgttggccag gctggtgtcc agattggcaa 120 tgcctgctgggagctctact gcctggaaca cggcatccag cccgatggcc agatgccaag 180 tgacaagaccattgggggag gagatgactc cttcaacacc ttcttcagtg agacgggcgc 240 tggcaagcacgtgccccggg ctgtgtttgt agacttggaa cccacagtca ttgatgaagt 300 tcgcactggcacctaccgcc agctcttcca ccctgagcag ctcatcacag gcaaggaaga 360 tgctgccaataactatgccc gagggcacta caccattggc aaggagatca ttgaccttgt 420 gttggaccgaattcgcaagc tggctgacca gtgcacccgt cttcagggct tcttggtttt 480 ccacagctttggtgggggaa ctggttctgg gttcacctcc ctgctcatgg aacgcctgtc 540 agttgattatggcaagaaat ccaagctgga gttctccatt tacccggcac cccaggtttc 600 cacagctgtagttgagccct acaactccat cctcaccacc cacaccaccc tggagcactc 660 tgattgtgccttcatggtag acaatgaggc catctatgac atctgtcgta gaaacctcga 720 tatcgagcgcccaacctaca ctaaccttaa ccgccttatt agccagattg tgtcctccat 780 cactgcttccctgagatttg atggagccct gaatgttgac ctgacagaat tccagaccaa 840 cctggtcccctacccccgca tccacttccc tctggccaca tatgcccctg tcatctctgc 900 tgagaaagcctaccatgaac agctttctgt agcagacatc accaatgctt gctttgagcc 960 agccaaccagatggtgaaat gtgaccctgg ccatggtaaa tacatggctt gctgcctgtt 1020 gtaccgtggtgacgtggttc ccaaagatgt caatgctgcc attgccacca tcaaaaccaa 1080 gcgcacgatccagtttgtgg attggtgccc cactggcttc aaggttggca tcaactacca 1140 gcctcccactgtggtgcctg gtggagacct ggccaaggta cagagagctg tgtgcatgct 1200 gagcaacaccacagccattg ctgaggcctg ggctcgcctg gaccacaagt ttgacctgat 1260 gtatgccaagcgtgcctttg ttcactggta cgtgggtgag gggatggagg aaggcgagtt 1320 ttcagaggcccgtgaagata tggctgccct tgagaaggat tatgaggagg ttggtgtgga 1380 ttctgttgaaggagagggtg aggaagaagg agaggaatac taattatcca ttccttttgg 1440 ccctgcagcatgtcatgctc ccagaatttc agcttcagct taactgacag atgttaaagc 1500 tttctggttagattgttttc acttggtgat catgtctttt ccatgtgtac ctgtaatatt 1560 tttccatcatatctcaaagt aaagtcatta acatca 1596 45 4276 DNA Homo sapiens 45 ctgtgacccagaagtcttcg aattcactgg tttttcagac tctgccacgg cacatgcgac 60 gaagagccatgagccacaac gtcaaacgcc ttcccagacg gttacaggag attgcccaga 120 aagaggcggagaaagccgta catcagaaaa aagaacattc aaaaaataaa tgccataaag 180 ctcgaagatgtcacatgaac cggacgctag aatttaaccg tagacaaaag aagaacattt 240 ggttagaaactcacatctgg cacgccaagc ggtttcatat ggtcaagaag tggggctact 300 gccttggggagaggccaaca gtcaagagcc acagagcctg ctatcgagcc atgacgaacc 360 ggtgcctcctgcaggattta tcctattact gttgtttgga gttgaaaggc aaagaggaag 420 aaatactaaaggcgctttct ggaatgtgta acatagacac agggctgacg tttgcagcag 480 ttcactgcttgtctggaaag cgccaaggga gccttgtgct ttatcgggtg aataaatatc 540 ccagagaaatgcttgggcct gttacgttta tctggaagtc ccagaggacc ccgggtgacc 600 cttctgagagcaggcagctg tggatctggc tgcatccaac ccttaaacag gatatcttag 660 aggaaataaaagcagcgtgc cagtgtgtgg aacccatcaa atcagctgtc tgcatcgctg 720 acccacttccaacaccatcc caagaaaaaa gccaaactga attgcctgac gagaaaattg 780 gcaagaaaagaaaaaggaaa gatgatggag aaaatgctaa accaattaaa aaaattatcg 840 gtgatggaactagagatcca tgtctaccat actcttggat ctctccaacc acaggcatta 900 taatcagcgatttgacgatg gagatgaaca gattccggct gattgggcca ctttcccact 960 ccatcctaactgaagcaata aaagctgctt ctgtccacac tgtgggagag gacacagagg 1020 agacacctcaccgctggtgg atagaaacct gtaagaaacc tgacagcgtt tcccttcatt 1080 gcagacaagaagccattttc gagttgttgg gaggaataac atcaccagca gaaattccgg 1140 caggtactattctgggactg acagttgggg atcctcgaat aaatttgccc caaaagaagt 1200 ccaaagctttgcccaatcca gaaaaatgcc aagataatga gaaagttaga cagctgcttc 1260 tggagggtgtgcctgtggaa tgtacgcata gctttatctg gaaccaagat atctgtaaga 1320 gtgtcacagagaataaaatc tcggatcagg atttaaaccg gatgaggagt gaattgctgg 1380 tgcctgggtcacagcttatt ttaggtcccc atgaatccaa gatacctata cttttgattc 1440 agcagccaggaaaagtgact ggtgaagatc gactaggctg gggaagtggc tgggatgtcc 1500 tactcccaaagggctggggc atggctttct ggattccatt tatttatcga ggtgtgagag 1560 tcggagggttgaaagagtct gcagtgcatt ctcagtataa gaggtcgcct aatgtcccag 1620 gcgattttccagactgccct gccgggatgc tgtttgcgga agagcaagct aagaatcttc 1680 ttgaaaagtacaaaagacgc cctcctgcaa aacggcccaa ctacgttaag cttggcactc 1740 tggcacctttctgctgtccc tgggagcagt taactcaaga ctgggagtca agagtccagg 1800 cttacgaagaaccttctgta gcttcatctc caaatggtaa ggagagtgac ctaagaagat 1860 ctgaggtgccttgtgctccc atgcctaaaa aaactcatca gccatctgat gaagtgggca 1920 catccatagagcaccccagg gaggcagagg aggtaatgga tgcagggtgt caagaatcgg 1980 cagggcctgagaggatcaca gaccaggagg ccagtgaaaa ccatgttgct gccacaggga 2040 gtcacctctgcgttctcagg agtagaaaat tactgaagca actgtcagcc tggtgtgggc 2100 ccagttctgaggatagtcgg ggaggccggc gagctcccgg cagaggccag caaggattga 2160 ccagagaggcttgcctgtcc atcttgggcc acttccccag ggccctggtt tgggtcagcc 2220 tgtccctgctcagcaagggc agccccgagc ctcacaccat gatctgtgtc ccagccaagg 2280 aggacttcctccagctccat gaggactggc attactgtgg gccccaggaa tccaaacaca 2340 gtgacccattcaggagcaag atcctgaaac agaaagagaa gaagaaaagg gagaagaggc 2400 agaagccaggacgtgcctct tctgatggcc cggcggggga agagcccgtg gctgggcagg 2460 aagctctgactctagggctg tggtcaggcc ctctgccgcg tgtgacgttg cactgctcca 2520 gaactctcctaggctttgtg actcagggag atttttccat ggctgttggc tgtggagaag 2580 ccctggggtttgttagcttg acaggcttgc tggatatgct gtccagccag cctgcagcgc 2640 agaggggcttagtgctactg aggcctcccg cctctctgca gtatcgattt gcgaggattg 2700 ctattgaggtgtgaatgcgt gcttgtatcc cagcagggca tagataatac gttattattg 2760 tctgccaagttctacatgtg gagaatctgc ttctgcttta aaatatcatg tgaaactccc 2820 tggaaacaagaataaaaaat tatgtattat gcagatgatg aaatgtttac atcattccag 2880 taatgtcattgattttcatc tttccctgtc cttgctgtaa tacttttaaa ttatttggcc 2940 aaaagctttgtattatgatc tcttggtctg tgtagttgtg gctgaaaata atgagaagct 3000 ctacgagttatcatcccctt tttttgttag aaacaaaggg cttgtcaggt ctatttgaaa 3060 aacctcatagtcatgtgata agcaacaata gatgtttaat gatttcactg ttatagcaga 3120 agacaagagaagacgcttgg cctctgtaca tgaaatatgg gctcctgatg gacctcattc 3180 aattctgtactgtgatttcc atgccgaaca actcaagcct taaagagaga aatcatggac 3240 aactgatttctgcctgtttt caggcaggca cagtttatgg cgtcagtgct aggctggaat 3300 tagaaagtgggggtctatga cgtggacttc ctgactcttt gatctctttg ttgttgacca 3360 acacttgatcctactagtta cttaattttt ttaagtaaaa aattattatt attttgtttc 3420 tgcaaagattttctcaaagc catagaggag catttctcag aatatgttct atgatatgtg 3480 tcacctaaaaaagtaagaga ttccaaggtc aggttgatat ggaaactcta ggttaaataa 3540 agttaagcatttctttatga aagaacttct ggaaacttcc atgtgataat gtgcattgcg 3600 gatctctaggaaggaaatga tagtgtatag tattttctaa atacttgtga ttcctaaagt 3660 tctcttacaaggagcccttt gtaggaccag tgttcttagt agcgcgcttt gggcagtgtg 3720 gctgtgtagtgcatagctac ctctgcaagg tgataactaa gccggcaagc tgcctttcaa 3780 cactcatgcagtcacgttgt ccacctgaga ttctcaacag ggtataaaag gaaggtctca 3840 tcttgcctcacaggaagagt gggctcagtg tggctttttt ccaactatgg agaaactcag 3900 tgctcatctactttaagttt ccacatatgg cttgctcata gccttggtcc ttacctttcc 3960 tgccataactttctagaaga gcttaatggg atttttttct aaaaaatgta aatatgcagt 4020 taggcattattttatgtaaa tgcattgggt ttttactgta gcatttggca ctaaatggct 4080 ttgggggtgatgaggtgggg aaggatacag caggtggtac agtagtcagg aagtacctgc 4140 caccaatgagatgtctgatg ctttgcctct taccatgcct ctgaatgtct ttggatccaa 4200 cccagatgagactgaaaaaa aaaaaacagt gtaactaagt ggcatctgta aacagaataa 4260 atgaaaatgtcacctg 4276 46 10 DNA Artificial Sequence Description of ArtificialSequence Primer 46 gtagcccagc 10 47 10 DNA Artificial SequenceDescription of Artificial Sequence Primer 47 gccacccaga 10 48 14 DNAArtificial Sequence Description of Artificial Sequence Primer 48acgaagaaga agag 14 49 10 DNA Artificial Sequence Description ofArtificial Sequence Primer 49 aggggcacca 10 50 23 DNA ArtificialSequence Description of Artificial Sequence Primer 50 aatgagggggacaaatggga agc 23 51 25 DNA Artificial Sequence Description ofArtificial Sequence Primer 51 ggagagccct tcctcagaca tgaag 25 52 25 DNAArtificial Sequence Description of Artificial Sequence Primer 52tgacaaaatg gtgacaggta gctgg 25 53 24 DNA Artificial Sequence Descriptionof Artificial Sequence Primer 53 aagtccacac ctcctcagac agcc 24 54 22 DNAArtificial Sequence Description of Artificial Sequence Primer 54cccagacacc caaacagccg tg 22 55 20 DNA Artificial Sequence Description ofArtificial Sequence Primer 55 tggagcagcc gtgtgtgctg 20 56 16 DNAArtificial Sequence Description of Artificial Sequence Primer 56aagctttttt tttttg 16 57 16 DNA Artificial Sequence Description ofArtificial Sequence Primer 57 aagctttttt ttttta 16 58 16 DNA ArtificialSequence Description of Artificial Sequence Primer 58 aagctttttt tttttc16 59 13 DNA Artificial Sequence Description of Artificial SequencePrimer 59 aagcttgatt gcc 13 60 13 DNA Artificial Sequence Description ofArtificial Sequence Primer 60 aagcttcgac tgt 13 61 13 DNA ArtificialSequence Description of Artificial Sequence Primer 61 aagctttggt cag 1362 13 DNA Artificial Sequence Description of Artificial Sequence Primer62 aagcttctca acg 13 63 17 DNA Artificial Sequence Description ofArtificial Sequence Primer 63 attttttttt tttttta 17 64 17 DNA ArtificialSequence Description of Artificial Sequence Primer 64 gttttttttt ttttttg17 65 14 DNA Artificial Sequence Description of Artificial SequencePrimer 65 tttttttttt tttv 14 66 11 DNA Artificial Sequence Descriptionof Artificial Sequence Primer 66 ggtgcctttg g 11 67 10 DNA ArtificialSequence Description of Artificial Sequence Primer 67 gcaccagggg 10 6815 DNA Artificial Sequence Description of Artificial Sequence Primer 68tttttttttt ttttt 15 69 23 DNA Artificial Sequence Description ofArtificial Sequence Primer 69 cacgtcttgg tgccttttgt gtg 23 70 23 DNAArtificial Sequence Description of Artificial Sequence Primer 70gaagctcagc tcagccctct tcc 23 71 25 DNA Artificial Sequence Descriptionof Artificial Sequence Primer 71 ccagggagac caaaagcctt catac 25 72 23DNA Artificial Sequence Description of Artificial Sequence Primer 72cacaggggag gtgatagcat tgc 23 73 25 DNA Artificial Sequence Descriptionof Artificial Sequence Primer 73 gtgcttttca aagatgctgc tagtg 25 74 22DNA Artificial Sequence Description of Artificial Sequence Primer 74gctcaatcca cccacaaaaa cc 22 75 23 DNA Artificial Sequence Description ofArtificial Sequence Primer 75 tcctctcact gccttggtgg atg 23 76 24 DNAArtificial Sequence Description of Artificial Sequence Primer 76cacagcaagt cacacattgg accc 24 77 22 DNA Artificial Sequence Descriptionof Artificial Sequence Primer 77 ccaaagacat tcagaggcat gg 22 78 22 DNAArtificial Sequence Description of Artificial Sequence Primer 78gaggtgggga aggatacagc ag 22 79 23 DNA Artificial Sequence Description ofArtificial Sequence Primer 79 gaaaagggtt ggggagaagc ctc 23 80 25 DNAArtificial Sequence Description of Artificial Sequence Primer 80tctctagcgt cctccatctc actgg 25 81 23 DNA Artificial Sequence Descriptionof Artificial Sequence Primer 81 acaactgcat cctcaccacc cac 23 82 25 DNAArtificial Sequence Description of Artificial Sequence Primer 82ggacacaatc tggctaataa ggcgg 25 83 16 DNA Artificial Sequence Descriptionof Artificial Sequence Primer 83 aagctttttt tttttg 16 84 16 DNAArtificial Sequence Description of Artificial Sequence Primer 84aagctttttt tttttc 16 85 16 DNA Artificial Sequence Description ofArtificial Sequence Primer 85 aagctttttt ttttta 16

We claim:
 1. A method of measuring the level of two or more nucleic acidmolecules in a target, comprising: (a) contacting a probe with a targetcomprising two or more nucleic acid molecules, wherein said nucleic acidmolecules are arbitrarily sampled and wherein said arbitrarily samplednucleic acid molecules comprise a subset of the nucleic acid moleculesin a population of nucleic acid molecules; and (b) detecting the amountof specific binding of said target to said probe.
 2. The method of claim1, wherein said target comprises one or more less abundant nucleic acidmolecules of said population.
 3. The method of claim 1, wherein saidless abundant nucleic acid molecule is less than 10% as abundant as themost abundant nucleic acid molecule in said population.
 4. The method ofclaim 1, wherein said less abundant nucleic acid molecule is less than1% as abundant as the most abundant nucleic acid molecule in saidpopulation.
 5. The method of claim 1, wherein said less abundant nucleicacid molecule is less than 0.1% as abundant as the most abundant nucleicacid molecule in said population.
 6. The method of claim 1, wherein saidless abundant nucleic acid molecule is less than 0.01% as abundant asthe most abundant nucleic acid molecule in said population.
 7. Themethod of claim 1, wherein said target is generated using one or morearbitrary oligonucleotides.
 8. The method of claim 1, wherein saidtarget is generated using RNA arbitrarily primed polymerase chainreaction (RAP-PCR).
 9. The method of claim 1, wherein said target isgenerated using differential display.
 10. The method of claim 1, whereinsaid target is generated using digestion-ligation.
 11. The method ofclaim 1, wherein said target is generated using a primer comprising anRNA polymerase promoter and an RNA polymerase.
 12. The method of claim11, wherein said RNA polymerase is selected from the group consisting ofT7 RNA polymerase, T3 RNA polymerase and SP6 polymerase.
 13. The methodof claim 1, wherein said target is amplified.
 14. The method of claim13, wherein said amplified target is generated using polymerase chainreaction.
 15. The method of claim 1, wherein said target is notamplified.
 16. The method of claim 1, wherein said probe is an array ofmolecules.
 17. The method of claim 16, wherein said molecules on saidarray are nucleic acid molecules.
 18. The method of claim 16, whereinsaid molecules on said array are oligonucleotides.
 19. The method ofclaim 16, wherein said molecules on said array are polypeptides.
 20. Themethod of claim 16, wherein said molecules on said array arepeptide-nucleic acids.
 21. The method of claim 1, wherein said targetcomprises 10 or more nucleic acid molecules.
 22. The method of claim 1,wherein said target comprises 20 or more nucleic acid molecules.
 23. Themethod of claim 1, wherein said target comprises 50 or more nucleic acidmolecules.
 24. The method of claim 1, wherein said target comprises 100or more nucleic acid molecules.
 25. The method of claim 1, wherein saidtarget comprises 1000 or more nucleic acid molecules.
 26. The method ofclaim 1, further comprising comparing said amount of specific binding ofsaid target to said probe, wherein said amount of specific bindingcorresponds to an expression level of said nucleic acid molecules insaid target, to an expression level of said nucleic acid molecules in asecond target.
 27. The method of claim 26, wherein said expression levelof said nucleic acid molecules in said second target is known.
 28. Themethod of claim 26, wherein said expression level of said nucleic acidmolecules in said second target is determined by contacting said secondtarget with said probe and detecting the amount of specific binding ofsaid probe to said second target.
 29. A method of measuring the level oftwo or more nucleic acid molecules in a target, comprising: (a)contacting a probe with a target comprising two or more nucleic acidmolecules, wherein said nucleic acid molecules are statistically sampledand wherein said statistically sampled nucleic acid molecules comprise asubset of the nucleic acid molecules in a population of nucleic acidmolecules; and (b) detecting the amount of specific binding of saidtarget to said probe.
 30. The method of claim 29, wherein said targetcomprises one or more less abundant sequences of said population. 31.The method of claim 30, wherein said less abundant sequence is less than10% as abundant as the most abundant sequence in said population. 32.The method of claim 30, wherein said less abundant sequence is less than1% as abundant as the most abundant sequence in said population.
 33. Themethod of claim 30, wherein said less abundant sequence is less than0.1% as abundant as the most abundant sequence in said population. 34.The method of claim 30, wherein said less abundant sequence is less than0.01% as abundant as the most abundant sequence in said population. 35.The method of claim 29, wherein said statistically sampled target isenhanced for complexity of unrelated nucleic acid molecules.
 36. Themethod of claim 29, wherein said target is generated using one or morestatistical oligonucleotides.
 37. The method of claim 36, wherein saidstatistical oligonucleotides are selected based on rank of complexitybinding.
 38. The method of claim 36, wherein said statisticaloligonucleotides are enhanced for complexity binding.
 39. The method ofclaim 29, wherein said target is generated using directed statisticalselection.
 40. The method of claim 29, wherein said target is generatedusing Monte-Carlo statistical selection.
 41. The method of claim 29,wherein said target is generated using digestion-ligation.
 42. Themethod of claim 29, wherein said target is generated using a primercomprising an RNA polymerase promoter and an RNA polymerase.
 43. Themethod of claim 42, wherein said RNA polymerase is selected from thegroup consisting of T7 RNA polymerase, T3 RNA polymerase and SP6polymerase.
 44. The method of claim 29, wherein said target isamplified.
 45. The method of claim 44, wherein said amplified target isgenerated using polymerase chain reaction.
 46. The method of claim 29,wherein said target is not amplified.
 47. The method of claim 29,wherein said probe is an array of molecules.
 48. The method of claim 47,wherein said molecules on said array are nucleic acid molecules.
 49. Themethod of claim 47, wherein said molecules on said array areoligonucleotides.
 50. The method of claim 47, wherein said molecules onsaid array are polypeptides.
 51. The method of claim 47, wherein saidmolecules on said array are peptide-nucleic acids.
 52. The method ofclaim 29, wherein said nucleic acid target comprises 10 or more nucleicacid molecules.
 53. The method of claim 29, wherein said nucleic acidtarget comprises 20 or more nucleic acid molecules.
 54. The method ofclaim 29, wherein said nucleic acid target comprises 50 or more nucleicacid molecules.
 55. The method of claim 29, wherein said nucleic acidtarget comprises 100 or more nucleic acid molecules.
 56. The method ofclaim 29, wherein said nucleic acid target comprises 1000 or morenucleic acid molecules.
 57. The method of claim 29, further comprisingcomparing said amount of specific binding of said target to said probe,wherein said amount of specific binding corresponds to an abundance ofsaid nucleic acid molecules in said target, to an abundance of saidnucleic acid molecules in a second target.
 58. The method of claim 57,wherein said abundance of said nucleic acid molecules in said secondtarget is known.
 59. The method of claim 57, wherein said abundance ofsaid nucleic acid molecules in said second target is determined bycontacting said second target with said probe and detecting the amountof specific binding of said probe to said second target.
 60. A method ofidentifying two or more differentially expressed nucleic acid moleculesassociated with a condition, comprising: (a) measuring the level of twoor more nucleic acid molecules in a target according to the method ofclaim 1, wherein said amount of specific binding of said target to saidprobe corresponds to an expression level of said nucleic acid moleculesin said target; (b) comparing said expression level of said nucleic acidmolecules in said target to an expression level of said nucleic acidmolecules in a second target, whereby a difference in expression levelbetween said targets indicates a condition.
 61. The method of claim 60,wherein said condition is associated with a disease state.
 62. Themethod of claim 60, wherein said disease state is selected from thegroup consisting of cancer, autoimmune disease, infectious disease,aging, developmental disorder, proliferative disorder, neurologicaldisorder.
 63. The method of claim 60, wherein said condition isassociated with a treatment.
 64. The method of claim 63, wherein saiddifference in expression level indicates an efficacy of said treatment.65. The method of claim 63, wherein said difference in expression levelindicates a resistance to said treatment.
 66. The method of claim 63,wherein said difference in expression level indicates a toxicity of saidtreatment.
 67. The method of claim 60, wherein said condition isassociated with a stimulus.
 68. The method of claim 67, wherein saidstimulus is a chemical.
 69. The method of claim 68, wherein saidchemical is a drug.
 70. The method of claim 67, wherein said stimulus isa growth factor.
 71. The method of claim 67, wherein said growth factoris epidermal growth factor (EGF).
 72. The method of claim 71, whereinsaid target comprises a portion of a nucleic acid sequence selected fromthe group consisting of nucleic acids referenced as SEQ ID NOS:1-45. 73.The method of claim 67, wherein said stimulus is radiation.
 74. Themethod of claim 67, wherein said stimulus is stress.
 75. The method ofclaim 60, wherein said target is derived from skin cells.
 76. The methodof claim 75, wherein said skin cells comprise keratinocytes.
 77. Themethod of claim 60, wherein said target is derived from a tumor.
 78. Themethod of claim 67, wherein said stimulus is a pathogen.
 79. A profilecomprising five or more stimulus-regulated nucleic acid molecules. 80.The profile of claim 79, wherein said profile comprises ten or morestimulus-regulated nucleic acid molecules.
 81. The profile of claim 79,wherein said profile comprises 100 or more stimulus-regulated nucleicacid molecules.
 82. The profile of claim 79, wherein said profilecomprises 1000 or more stimulus-regulated nucleic acid molecules. 83.The profile of claim 80, wherein said stimulus is epidermal growthfactor.
 84. The profile of claim 83, comprising a portion of anucleotide sequence selected from the group consisting of the nucleotidesequences referenced as SEQ ID NOS:1-45.
 85. A profile obtained by themethod of claim
 1. 86. The profile of claim 85, wherein said profilecomprises two or more nucleic acid molecules.
 87. The profile of claim85, wherein said profile comprises 5 or more nucleic acid molecules. 88.The profile of claim 85, wherein said profile comprises 10 or morenucleic acid molecules.
 89. The profile of claim 85, wherein saidprofile comprises 100 or more nucleic acid molecules.
 90. A profileobtained by the method of claim
 29. 91. The profile of claim 90, whereinsaid profile comprises two or more nucleic acid molecules.
 92. Theprofile of claim 90, wherein said profile comprises 5 or more nucleicacid molecules.
 93. The profile of claim 90, wherein said profilecomprises 10 or more nucleic acid molecules.
 94. The profile of claim90, wherein said profile comprises 100 or more nucleic acid molecules.95. A target comprising a portion of each of the nucleotide sequencesreferenced as SEQ ID NOS:1-45.